In-Depth Guide to Java Regular Expressions
Introduction
Regular expressions are a powerful text matching tool that are widely used for operations such as string search and replacement. In Java, the use of regular expressions involves the Pattern
and Matcher
classes. This article aims to provide a more in-depth perspective, detailing the various symbols and patterns in regular expressions, including braces, parentheses, square brackets, and start and end characters.
Components of regular expressions
1. Character class
- Brackets
[ ]
: Define a character set. For example,[abc]
matches “a”, “b”, or “c”. - Predefined character classes: such as
d
(any number),s
(blank character),w
(alphanumeric character).
2. Quantifier
- Asterix
*
: Zero or more matches. - Plus sign
+
: One or more matches. - Question mark
?
: Zero or one match. - Braces
{ }
: Customized number of matches. For example,X{2}
(X appears twice),X{2,}
(at least twice),X{2,5}
(two to five times).
3. Boundary matching characters
- Caret
^
: Matches the beginning of the input string. - Dollar sign
$
: Matches the end of the input string.
4. Grouping and capturing
- Parentheses
( )
: mark the beginning and end of a subexpression. For example,(abc)
matches “abc”.
5. Special structure
- Non-capturing group
(?: )
: Only matches the expression within the brackets, but does not capture the matching result. - Forward lookahead assertion
(?= )
: If the following characters match the expression within the brackets, the match is successful. - Negative lookahead assertion
(?! )
: If the following characters do not match the expression within the brackets, the match is successful.
Practical application examples
Example 1: Using braces
Pattern pattern = Pattern.compile("d{2,4}"); Matcher matcher = pattern.matcher("123"); if (matcher.find()) {<!-- --> System.out.println("Match found: " + matcher.group()); }
Example 2: Grouping using parentheses
Pattern pattern = Pattern.compile("(d + )([a-z])"); Matcher matcher = pattern.matcher("123abc"); while (matcher.find()) {<!-- --> System.out.println("Group 1: " + matcher.group(1)); System.out.println("Group 2: " + matcher.group(2)); }
Example 3: Using boundary matches
Pattern pattern = Pattern.compile("^The"); Matcher matcher = pattern.matcher("The end"); if (matcher.find()) {<!-- --> System.out.println("Match found at the start of string"); }
Example 4: Using non-capturing groups
Pattern pattern = Pattern.compile("a(?:bc)*"); Matcher matcher = pattern.matcher("abcbcbc"); if (matcher.find()) {<!-- --> System.out.println("Match found: " + matcher.group()); }
Example 5: Using forward lookahead assertions
Pattern pattern = Pattern.compile("d(?=D)"); Matcher matcher = pattern.matcher("123a"); while (matcher.find()) {<!-- --> System.out.println("Match found: " + matcher.group()); }
More code sample collections
1. Match Chinese characters
Pattern pattern = Pattern.compile("[u4e00-u9fa5]"); Matcher matcher = pattern.matcher("This is a Chinese test"); if (matcher.find()) {<!-- --> System.out.println("Contains Chinese characters"); }
2. Match double-byte characters (including Chinese characters)
Pattern pattern = Pattern.compile("[^x00-xff]"); Matcher matcher = pattern.matcher("Test ABC"); while (matcher.find()) {<!-- --> System.out.println("Double-byte character found: " + matcher.group()); }
3. Match empty lines
Pattern pattern = Pattern.compile("^s*$", Pattern.MULTILINE); Matcher matcher = pattern.matcher("First linennThird line"); while (matcher.find()) {<!-- --> System.out.println("Empty line found at index: " + matcher.start()); }
4. Match HTML tags
Pattern pattern = Pattern.compile("<("[^"]*"|'[^']*'|[^'">])*>") ; Matcher matcher = pattern.matcher("<html><head></head><body></body></html>"); while (matcher.find()) {<!-- --> System.out.println("HTML tag found: " + matcher.group()); }
5. Match leading and trailing spaces (remove leading and trailing spaces)
String input = "Hello World"; String result = input.replaceAll("^s + |s + $", ""); System.out.println("Trimmed String: " + result);
6. Match IP address
Pattern pattern = Pattern.compile("b(?:d{1,3}.){3}d{1,3}b"); Matcher matcher = pattern.matcher("192.168.1.1"); if (matcher.find()) {<!-- --> System.out.println("Valid IP address: " + matcher.group()); }
7. Match email address
Pattern pattern = Pattern.compile("b[A-Za-z0-9._% + -] + @[A-Za-z0-9.-] + .[A-Z]{2,6} b", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher("[email protected]"); if (matcher.find()) {<!-- --> System.out.println("Valid email address: " + matcher.group()); }
8. Match URL
Pattern pattern = Pattern.compile("http[s]?://[w.-] + (?:/[w-./?% & amp;=]*)?"); Matcher matcher = pattern.matcher("http://www.example.com"); if (matcher.find()) {<!-- --> System.out.println("Valid URL: " + matcher.group()); }
9. Match non-negative integers
Pattern pattern = Pattern.compile("bd + b"); Matcher matcher = pattern.matcher("123"); if (matcher.find()) {<!-- --> System.out.println("Non-negative integer: " + matcher.group()); }
10. Match positive integers
Pattern pattern = Pattern.compile("b[1-9]d*b"); Matcher matcher = pattern.matcher("123"); if (matcher.find()) {<!-- --> System.out.println("Positive integer: " + matcher.group()); }
11. Match non-positive integers
Pattern pattern = Pattern.compile("b-[1-9]d*|0b"); Matcher matcher = pattern.matcher("-123"); if (matcher.find()) {<!-- --> System.out.println("Non-positive integer: " + matcher.group()); }
12. Match negative integers
Pattern pattern = Pattern.compile("b-[1-9]d*b"); Matcher matcher = pattern.matcher("-123"); if (matcher.find()) {<!-- --> System.out.println("Negative integer: " + matcher.group()); }
13. Match integers
Pattern pattern = Pattern.compile("b-?d + b"); Matcher matcher = pattern.matcher("-123"); if (matcher.find()) {<!-- --> System.out.println("Integer: " + matcher.group()); }
14. Match non-negative floating point numbers
Pattern pattern = Pattern.compile("bd + (.d + )?b"); Matcher matcher = pattern.matcher("123.45"); if (matcher.find()) {<!-- --> System.out.println("Non-negative floating-point number: " + matcher.group()); }
15. Match positive floating point numbers
Pattern pattern = Pattern.compile("b[0-9]d*(.d + )?b"); Matcher matcher = pattern.matcher("123.45"); if (matcher.find()) {<!-- --> System.out.println("Positive floating-point number: " + matcher.group()); }
16. Match non-positive floating point numbers
Pattern pattern = Pattern.compile("b-(d + (.d + )?)|0(.0 + )?b"); Matcher matcher = pattern.matcher("-123.45"); if (matcher.find()) {<!-- --> System.out.println("Non-positive floating-point number: " + matcher.group()); }
17. Match negative floating point numbers
Pattern pattern = Pattern.compile("b-([0-9]d*(.d + )?)b"); Matcher matcher = pattern.matcher("-123.45"); if (matcher.find()) {<!-- --> System.out.println("Negative floating-point number: " + matcher.group()); }
18. Match English strings
Pattern pattern = Pattern.compile("[A-Za-z] + "); Matcher matcher = pattern.matcher("HelloWorld"); if (matcher.find()) {<!-- --> System.out.println("English string: " + matcher.group()); }
19. Match English uppercase strings
Pattern pattern = Pattern.compile("[A-Z] + "); Matcher matcher = pattern.matcher("HELLO"); if (matcher.find()) {<!-- --> System.out.println("Uppercase English string: " + matcher.group()); }
20. Match English lowercase strings
Pattern pattern = Pattern.compile("[a-z] + "); Matcher matcher = pattern.matcher("hello"); if (matcher.find()) {<!-- --> System.out.println("Lowercase English string: " + matcher.group()); }
21. Match English character and numeric strings
Pattern pattern = Pattern.compile("[A-Za-z0-9] + "); Matcher matcher = pattern.matcher("Hello123"); if (matcher.find()) {<!-- --> System.out.println("Alphanumeric string: " + matcher.group()); }
22. Match alphanumeric and underlined strings
Pattern pattern = Pattern.compile("w + "); Matcher matcher = pattern.matcher("Hello_123"); if (matcher.find()) {<!-- --> System.out.println("Alphanumeric string with underscores: " + matcher.group()); }
23. Match E-mail addresses
Pattern pattern = Pattern.compile("[w.-] + @[w.-] + .[A-Za-z]{2,}"); Matcher matcher = pattern.matcher("[email protected]"); if (matcher.find()) {<!-- --> System.out.println("Email address: " + matcher.group()); }
24. Match URL
Pattern pattern = Pattern.compile("[a-zA-z] + ://[^s]*"); Matcher matcher = pattern.matcher("http://www.example.com"); if (matcher.find()) {<!-- --> System.out.println("URL: " + matcher.group()); }
25. Match postal codes
Pattern pattern = Pattern.compile("bd{5}(?:-d{4})?b"); Matcher matcher = pattern.matcher("12345-6789"); if (matcher.find()) {<!-- --> System.out.println("Postal code: " + matcher.group()); }
26. Match Chinese
Pattern pattern = Pattern.compile("[u4e00-u9fa5] + "); Matcher matcher = pattern.matcher("This is a Chinese paragraph"); if (matcher.find()) {<!-- --> System.out.println("Chinese text: " + matcher.group()); }
27. Match phone numbers
Pattern pattern = Pattern.compile("bd{3}-d{3}-d{4}b"); Matcher matcher = pattern.matcher("123-456-7890"); if (matcher.find()) {<!-- --> System.out.println("Phone number: " + matcher.group()); }
28. Match mobile phone numbers
Pattern pattern = Pattern.compile("b1[34578]d{9}b"); Matcher matcher = pattern.matcher("13812345678"); if (matcher.find()) {<!-- --> System.out.println("Mobile number: " + matcher.group()); }
29. Match double-byte characters
Pattern pattern = Pattern.compile("[^x00-xff]"); Matcher matcher = pattern.matcher("Test ABC"); while (matcher.find()) {<!-- --> System.out.println("Double-byte character: " + matcher.group()); }
30. Match leading and trailing spaces
import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main {<!-- --> public static void main(String[] args) {<!-- --> String text = "Hello World! "; Pattern pattern = Pattern.compile("^s + |s + $"); Matcher matcher = pattern.matcher(text); //Replace leading and trailing spaces String result = matcher.replaceAll(""); System.out.println("Original string: '" + text + "'"); System.out.println("After removing leading and trailing spaces: '" + result + "'"); } }
31. Match Chinese characters
Pattern pattern = Pattern.compile("[u4e00-u9fa5]"); Matcher matcher = pattern.matcher("Hello, World!"); while (matcher.find()) {<!-- --> System.out.println("Matched Chinese character: " + matcher.group()); }
32. Match double-byte characters (including Chinese characters)
Pattern pattern = Pattern.compile("[^x00-xff]"); Matcher matcher = pattern.matcher("Double-byte character test abc"); while (matcher.find()) {<!-- --> System.out.println("Matched double-byte character: " + matcher.group()); }
33. Match empty lines
Pattern pattern = Pattern.compile("^s*$", Pattern.MULTILINE); Matcher matcher = pattern.matcher("First linennThird line"); while (matcher.find()) {<!-- --> System.out.println("Matched empty line at index: " + matcher.start()); }
34. Match HTML tags
Pattern pattern = Pattern.compile("<(S*?)[^>]*>.*?</1>|<.*? />"); Matcher matcher = pattern.matcher("<html><head></head><body></body></html>"); while (matcher.find()) {<!-- --> System.out.println("Matched HTML tag: " + matcher.group()); }
35. Match leading and trailing spaces
Pattern pattern = Pattern.compile("^s + |s + $"); Matcher matcher = pattern.matcher(" Hello World! "); String result = matcher.replaceAll(""); System.out.println("String after removing leading and trailing spaces: " + result);
36. Match IP address
Pattern pattern = Pattern.compile("b(?:[0-9]{1,3}.){3}[0-9]{1,3}b"); Matcher matcher = pattern.matcher("192.168.1.1 and 10.0.0.1"); while (matcher.find()) {<!-- --> System.out.println("Matched IP Address: " + matcher.group()); }
37. Match email addresses
Pattern pattern = Pattern.compile("[w.-] + @[w.-] + .[a-zA-Z]{2,6}"); Matcher matcher = pattern.matcher("[email protected]"); if (matcher.find()) {<!-- --> System.out.println("Matched Email: " + matcher.group()); }
38. Match URL
Pattern pattern = Pattern.compile("http[s]?://[w.] + [/w ./?% & amp;=]*"); Matcher matcher = pattern.matcher("Visit https://www.example.com!"); while (matcher.find()) {<!-- --> System.out.println("Matched URL: " + matcher.group()); }
Best Practices
- Understand and test: Regular expressions can be complex, and it’s important to understand their components and test their behavior.
- Performance considerations: Regular expressions can impact the performance of your application, especially when processing large amounts of text.
- Avoid overuse: In some cases, simple string manipulation may be more
Complex regular expressions are more efficient.
Conclusion
Regular expressions are a powerful tool that can be useful in a variety of string processing scenarios.