Basic applications of regular expressions

Regular expressions are often used to verify whether a known string meets specific requirements, such as whether it is a number, whether it is an all-lowercase string, and so on.

A regular expression (regex for short) is a string used to describe a pattern that matches a set of strings. Regular expressions can be used to match, replace, and split strings.

1. Match string

Regular expressions are generally used in conjunction with the matches method in the String class, which is very similar to the equals method. For example, both of the following statements evaluate to true.

"Java". matches("Java");
"Java".equals("Java");

However, the matches method is more powerful. It can not only match a fixed string, but also match a set of strings that match a pattern, which is a regular expression. For example, the following statements all evaluate to true.

"Java is fun". matches("Java.*");
"Java is cool". matches("Java.*");
"Java is powerful". matches("Java.*");

The “Java.*” in the statement is a regular expression, and its meaning will be given in detail in the next section.

2. Regular expression syntax

Regular expressions are composed of literal characters and special symbols, and the common syntax of regular expressions is listed below.

NOTE: A backslash is a special character that begins an escape sequence in a string. So you need to use \ to represent \ in Java.

// x specifies the character x
"Java". matches("Java");

// . Any single character, except newline
"Java". matches("J..a");

// (ab|cd) ab or cd
"ten". matches("t(en|im)");

// [abc] a, b or c
"Java". matches("Ja[uvwx]a");

// [^abc] any character except a, b or c
"Java". matches("Ja[^ars]a");

// [a-z] a to z
"Java". matches("[A-Z]av[a-d]");

// [^a-z] Any character except a to z
"Java". matches("Jav[^b-d]");

// [a-e[m-p]] a to e or m to p
"Java". matches("[A-G[I-M]]av[a-d]");

// [a-e & amp; & amp;[c-p]] intersection of a to e and c to p
"Java".matches("[A-P & amp; & amp;[I-M]]av[a-d]");

// \d one digit, equivalent to [0-9]
"Java2". matches("Java[\d]");

// \D a non-digit
"$Java".matches("[\D][\D]ava");

// \w word character, that is, any letter, number or underscore character
"Java1".matches("[\w]ava[\d]");

// \W non-word character
"$Java".matches("[\W][\w]ava");

// \s whitespace character
"Java 2". matches("Java\s2");

// \S non-whitespace character
"Java". matches("[\S]ava");

// p* 0 or multiple occurrences of pattern p
"aaaa". matches("a*");
"abab". matches("(ab)*");

// p + 1 or more occurrences of pattern p
"a". matches("a + b*");
"able". matches("(ab) + .*");

// p? 0 or 1 occurrence of pattern p
"Java". matches("J?Java");
"ava". matches("J?ava");

// p{n} exactly n occurrences of pattern p
"Java". matches("Ja{1}.*");
"Java". matches(".{2}"); //false

// p{n,} pattern p occurs at least n times
"aaaa". matches("a{1,}");
"a". matches("a{2,}"); //false

// p{n,m} n to m (exclusive) occurrences of pattern p
"aaaa". matches("a{1,9}");
"abb". matches("a{2,9}bb"); //false

// \p{P} A punctuation character !"#$% & amp;'()* + ,-./:;<=>?@[\]^_'{|}~
"J?a". matches("J\p{P}a");
"J?a". matches("J\p{P}a"); //false
  • A word character is any letter, number or underscore character. So \w is equivalent to [a-z[A-Z][0-9]_] or simply [a-zA-Z0-9_].
  • The last six regular expressions *, +, ?, {n}, {n,}, and {n,m} are called quantifiers, are used to determine how many times the pattern before the quantifiers will be repeated .
  • Do not use whitespace in repeating quantifiers. For example, A{3,6} cannot be written as A{3, 6} with a space after the comma.
  • Patterns can be grouped using parentheses. For example, (ab){3} matches ababab, but ab{3} matches abbb.

Let’s demonstrate how to build regular expressions with some examples.

Example 1

The pattern of a social security number is xxx-xx-xxxx, where x is a single digit, and its regular expression can be described as

[\d]{3}-[\d]{2}-[\d]{4}

Example 2

Even numbers end with the digits 0, 2, 4, 6 or 8. Even patterns can be described as

[\d]*[02468]

Example 3

The pattern of the phone number is (xxx)xxx-xxxx, here x is a single digit, and the first digit cannot be 0, its regular expression can be described as

\([1-9][\d]{2}\)[\d]{3}-[\d]{4}

Brackets ( and ) are special characters in regular expressions and are used to group patterns. In order to represent literal values ( or ) in regular expressions, \( and \) must be used.

Example 4

The last name is assumed to consist of a maximum of 25 letters with the first letter capitalized. Then the pattern of surnames can be described as

[A-Z][a-zA-Z]{1,24}

Example 5

The schema of a Java identifier can be described as

[a-zA-Z_$][\w$]*

3. Replacement and split string

The String class also contains replaceAll, replaceFirst, and split methods for replacing and splitting strings.

The replaceAll method replaces all matching substrings, and the replaceFirst method replaces the first matching substring.

System.out.println("Java Java Java".replaceAll("v\w","wi")); //Display Jawi Jawi Jawi
System.out.println("Java Java Java".replaceFirst("v\w","wi")); //Display Jawi Java Java

There are two overloaded split methods. The split(regex) method splits a string into substrings using matching delimiters. For example the following statement

String[] tokens = "Java1HTML2Perl".split("\d");

Split the string “Java1HTML2Perl” into Java, HTML and Perl and store in tokens[0], tokens[1] and tokens[2].

In the split(regex,limit) method, the limit parameter determines how many times the pattern matches. If limit <= 0, split(regex,limit) is equivalent to split(regex). If limit > 0, the pattern matches limit – 1 times at most. Here are some examples

"Java1HTML2Perl".split("\d",0); \ split into Java, HTML, Perl
"Java1HTML2Perl".split("\d",1); \ split into JavaHTMLPerl
"Java1HTML2Perl".split("\d",2); \ split into Java, HTMLPerl
"Java1HTML2Perl".split("\d",3); \ split into Java, HTML, Perl
"Java1HTML2Perl".split("\d",4); \ split into Java, HTML, Perl
"Java1HTML2Perl".split("\d",5); \ split into Java, HTML, Perl

By default, all quantifiers are “greedy”. This means they will match as many times as possible. For example, the following statement displays JRvaa. Because the first successful match is aaa.

System.out.println("Jaaavaa".replaceFirst("a + ","R")); //Display JRvaa

The default behavior of quantifiers can be changed by adding a hello (?) at the end. The quantifier becomes “reluctant” or “lazy”, meaning it will match as few times as possible. For example, the following statement displays JRaavaa because the first successful match is a.

System.out.println("Jaaavaa".replaceFirst("a + ?","R")); //Display JRaavaa

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Java skill treeHomepageOverview 108658 people are studying systematically