Comprehensive explanation of Java regular expressions and various sample codes

In-Depth Guide to Java Regular Expressions

Introduction

Regular expressions are a powerful text matching tool that are widely used for operations such as string search and replacement. In Java, the use of regular expressions involves the Pattern and Matcher classes. This article aims to provide a more in-depth perspective, detailing the various symbols and patterns in regular expressions, including braces, parentheses, square brackets, and start and end characters.

Components of regular expressions

1. Character class

  • Brackets [ ]: Define a character set. For example, [abc] matches “a”, “b”, or “c”.
  • Predefined character classes: such as d (any number), s (blank character), w (alphanumeric character).

2. Quantifier

  • Asterix *: Zero or more matches.
  • Plus sign + : One or more matches.
  • Question mark ?: Zero or one match.
  • Braces { }: Customized number of matches. For example, X{2} (X appears twice), X{2,} (at least twice), X{2,5} (two to five times).

3. Boundary matching characters

  • Caret ^: Matches the beginning of the input string.
  • Dollar sign $: Matches the end of the input string.

4. Grouping and capturing

  • Parentheses ( ): mark the beginning and end of a subexpression. For example, (abc) matches “abc”.

5. Special structure

  • Non-capturing group (?: ): Only matches the expression within the brackets, but does not capture the matching result.
  • Forward lookahead assertion (?= ): If the following characters match the expression within the brackets, the match is successful.
  • Negative lookahead assertion (?! ): If the following characters do not match the expression within the brackets, the match is successful.

Practical application examples

Example 1: Using braces

Pattern pattern = Pattern.compile("d{2,4}");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {<!-- -->
    System.out.println("Match found: " + matcher.group());
}

Example 2: Grouping using parentheses

Pattern pattern = Pattern.compile("(d + )([a-z])");
Matcher matcher = pattern.matcher("123abc");
while (matcher.find()) {<!-- -->
    System.out.println("Group 1: " + matcher.group(1));
    System.out.println("Group 2: " + matcher.group(2));
}

Example 3: Using boundary matches

Pattern pattern = Pattern.compile("^The");
Matcher matcher = pattern.matcher("The end");
if (matcher.find()) {<!-- -->
    System.out.println("Match found at the start of string");
}

Example 4: Using non-capturing groups

Pattern pattern = Pattern.compile("a(?:bc)*");
Matcher matcher = pattern.matcher("abcbcbc");
if (matcher.find()) {<!-- -->
    System.out.println("Match found: " + matcher.group());
}

Example 5: Using forward lookahead assertions

Pattern pattern = Pattern.compile("d(?=D)");
Matcher matcher = pattern.matcher("123a");
while (matcher.find()) {<!-- -->
    System.out.println("Match found: " + matcher.group());
}

More code sample collections

1. Match Chinese characters

Pattern pattern = Pattern.compile("[u4e00-u9fa5]");
Matcher matcher = pattern.matcher("This is a Chinese test");
if (matcher.find()) {<!-- -->
    System.out.println("Contains Chinese characters");
}

2. Match double-byte characters (including Chinese characters)

Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("Test ABC");
while (matcher.find()) {<!-- -->
    System.out.println("Double-byte character found: " + matcher.group());
}

3. Match empty lines

Pattern pattern = Pattern.compile("^s*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher("First linennThird line");
while (matcher.find()) {<!-- -->
    System.out.println("Empty line found at index: " + matcher.start());
}

4. Match HTML tags

Pattern pattern = Pattern.compile("<("[^"]*"|'[^']*'|[^'">])*>") ;
Matcher matcher = pattern.matcher("<html><head></head><body></body></html>");
while (matcher.find()) {<!-- -->
    System.out.println("HTML tag found: " + matcher.group());
}

5. Match leading and trailing spaces (remove leading and trailing spaces)

String input = "Hello World";
String result = input.replaceAll("^s + |s + $", "");
System.out.println("Trimmed String: " + result);

6. Match IP address

Pattern pattern = Pattern.compile("b(?:d{1,3}.){3}d{1,3}b");
Matcher matcher = pattern.matcher("192.168.1.1");
if (matcher.find()) {<!-- -->
    System.out.println("Valid IP address: " + matcher.group());
}

7. Match email address

Pattern pattern = Pattern.compile("b[A-Za-z0-9._% + -] + @[A-Za-z0-9.-] + .[A-Z]{2,6} b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("[email protected]");
if (matcher.find()) {<!-- -->
    System.out.println("Valid email address: " + matcher.group());
}

8. Match URL

Pattern pattern = Pattern.compile("http[s]?://[w.-] + (?:/[w-./?% & amp;=]*)?");
Matcher matcher = pattern.matcher("http://www.example.com");
if (matcher.find()) {<!-- -->
    System.out.println("Valid URL: " + matcher.group());
}

9. Match non-negative integers

Pattern pattern = Pattern.compile("bd + b");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {<!-- -->
    System.out.println("Non-negative integer: " + matcher.group());
}

10. Match positive integers

Pattern pattern = Pattern.compile("b[1-9]d*b");
Matcher matcher = pattern.matcher("123");
if (matcher.find()) {<!-- -->
    System.out.println("Positive integer: " + matcher.group());
}

11. Match non-positive integers

Pattern pattern = Pattern.compile("b-[1-9]d*|0b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {<!-- -->
    System.out.println("Non-positive integer: " + matcher.group());
}

12. Match negative integers

Pattern pattern = Pattern.compile("b-[1-9]d*b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {<!-- -->
    System.out.println("Negative integer: " + matcher.group());
}

13. Match integers

Pattern pattern = Pattern.compile("b-?d + b");
Matcher matcher = pattern.matcher("-123");
if (matcher.find()) {<!-- -->
    System.out.println("Integer: " + matcher.group());
}

14. Match non-negative floating point numbers

Pattern pattern = Pattern.compile("bd + (.d + )?b");
Matcher matcher = pattern.matcher("123.45");
if (matcher.find()) {<!-- -->
    System.out.println("Non-negative floating-point number: " + matcher.group());
}

15. Match positive floating point numbers

Pattern pattern = Pattern.compile("b[0-9]d*(.d + )?b");
Matcher matcher = pattern.matcher("123.45");
if (matcher.find()) {<!-- -->
    System.out.println("Positive floating-point number: " + matcher.group());
}

16. Match non-positive floating point numbers

Pattern pattern = Pattern.compile("b-(d + (.d + )?)|0(.0 + )?b");
Matcher matcher = pattern.matcher("-123.45");
if (matcher.find()) {<!-- -->
    System.out.println("Non-positive floating-point number: " + matcher.group());
}

17. Match negative floating point numbers

Pattern pattern = Pattern.compile("b-([0-9]d*(.d + )?)b");
Matcher matcher = pattern.matcher("-123.45");
if (matcher.find()) {<!-- -->
    System.out.println("Negative floating-point number: " + matcher.group());
}

18. Match English strings

Pattern pattern = Pattern.compile("[A-Za-z] + ");
Matcher matcher = pattern.matcher("HelloWorld");
if (matcher.find()) {<!-- -->
    System.out.println("English string: " + matcher.group());
}

19. Match English uppercase strings

Pattern pattern = Pattern.compile("[A-Z] + ");
Matcher matcher = pattern.matcher("HELLO");
if (matcher.find()) {<!-- -->
    System.out.println("Uppercase English string: " + matcher.group());
}

20. Match English lowercase strings

Pattern pattern = Pattern.compile("[a-z] + ");
Matcher matcher = pattern.matcher("hello");
if (matcher.find()) {<!-- -->
    System.out.println("Lowercase English string: " + matcher.group());
}

21. Match English character and numeric strings

Pattern pattern = Pattern.compile("[A-Za-z0-9] + ");
Matcher matcher = pattern.matcher("Hello123");
if (matcher.find()) {<!-- -->
    System.out.println("Alphanumeric string: " + matcher.group());
}

22. Match alphanumeric and underlined strings

Pattern pattern = Pattern.compile("w + ");
Matcher matcher = pattern.matcher("Hello_123");
if (matcher.find()) {<!-- -->
    System.out.println("Alphanumeric string with underscores: " + matcher.group());
}

23. Match E-mail addresses

Pattern pattern = Pattern.compile("[w.-] + @[w.-] + .[A-Za-z]{2,}");
Matcher matcher = pattern.matcher("[email protected]");
if (matcher.find()) {<!-- -->
    System.out.println("Email address: " + matcher.group());
}

24. Match URL

Pattern pattern = Pattern.compile("[a-zA-z] + ://[^s]*");
Matcher matcher = pattern.matcher("http://www.example.com");
if (matcher.find()) {<!-- -->
    System.out.println("URL: " + matcher.group());
}

25. Match postal codes

Pattern pattern = Pattern.compile("bd{5}(?:-d{4})?b");
Matcher matcher = pattern.matcher("12345-6789");
if (matcher.find()) {<!-- -->
    System.out.println("Postal code: " + matcher.group());
}

26. Match Chinese

Pattern pattern = Pattern.compile("[u4e00-u9fa5] + ");
Matcher matcher = pattern.matcher("This is a Chinese paragraph");
if (matcher.find()) {<!-- -->
    System.out.println("Chinese text: " + matcher.group());
}

27. Match phone numbers

Pattern pattern = Pattern.compile("bd{3}-d{3}-d{4}b");
Matcher matcher = pattern.matcher("123-456-7890");
if (matcher.find()) {<!-- -->
    System.out.println("Phone number: " + matcher.group());
}

28. Match mobile phone numbers

Pattern pattern = Pattern.compile("b1[34578]d{9}b");
Matcher matcher = pattern.matcher("13812345678");
if (matcher.find()) {<!-- -->
    System.out.println("Mobile number: " + matcher.group());
}

29. Match double-byte characters

Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("Test ABC");
while (matcher.find()) {<!-- -->
    System.out.println("Double-byte character: " + matcher.group());
}

30. Match leading and trailing spaces

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {<!-- -->
    public static void main(String[] args) {<!-- -->
        String text = "Hello World! ";
        Pattern pattern = Pattern.compile("^s + |s + $");
        Matcher matcher = pattern.matcher(text);

        //Replace leading and trailing spaces
        String result = matcher.replaceAll("");
        System.out.println("Original string: '" + text + "'");
        System.out.println("After removing leading and trailing spaces: '" + result + "'");
    }
}

31. Match Chinese characters

Pattern pattern = Pattern.compile("[u4e00-u9fa5]");
Matcher matcher = pattern.matcher("Hello, World!");
while (matcher.find()) {<!-- -->
    System.out.println("Matched Chinese character: " + matcher.group());
}

32. Match double-byte characters (including Chinese characters)

Pattern pattern = Pattern.compile("[^x00-xff]");
Matcher matcher = pattern.matcher("Double-byte character test abc");
while (matcher.find()) {<!-- -->
    System.out.println("Matched double-byte character: " + matcher.group());
}

33. Match empty lines

Pattern pattern = Pattern.compile("^s*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher("First linennThird line");
while (matcher.find()) {<!-- -->
    System.out.println("Matched empty line at index: " + matcher.start());
}

34. Match HTML tags

Pattern pattern = Pattern.compile("<(S*?)[^>]*>.*?</1>|<.*? />");
Matcher matcher = pattern.matcher("<html><head></head><body></body></html>");
while (matcher.find()) {<!-- -->
    System.out.println("Matched HTML tag: " + matcher.group());
}

35. Match leading and trailing spaces

Pattern pattern = Pattern.compile("^s + |s + $");
Matcher matcher = pattern.matcher(" Hello World! ");
String result = matcher.replaceAll("");
System.out.println("String after removing leading and trailing spaces: " + result);

36. Match IP address

Pattern pattern = Pattern.compile("b(?:[0-9]{1,3}.){3}[0-9]{1,3}b");
Matcher matcher = pattern.matcher("192.168.1.1 and 10.0.0.1");
while (matcher.find()) {<!-- -->
    System.out.println("Matched IP Address: " + matcher.group());
}

37. Match email addresses

Pattern pattern = Pattern.compile("[w.-] + @[w.-] + .[a-zA-Z]{2,6}");
Matcher matcher = pattern.matcher("[email protected]");
if (matcher.find()) {<!-- -->
    System.out.println("Matched Email: " + matcher.group());
}

38. Match URL

Pattern pattern = Pattern.compile("http[s]?://[w.] + [/w ./?% & amp;=]*");
Matcher matcher = pattern.matcher("Visit https://www.example.com!");
while (matcher.find()) {<!-- -->
    System.out.println("Matched URL: " + matcher.group());
}

Best Practices

  • Understand and test: Regular expressions can be complex, and it’s important to understand their components and test their behavior.
  • Performance considerations: Regular expressions can impact the performance of your application, especially when processing large amounts of text.
  • Avoid overuse: In some cases, simple string manipulation may be more

Complex regular expressions are more efficient.

Conclusion

Regular expressions are a powerful tool that can be useful in a variety of string processing scenarios.