boost.regex regular expression

Article directory

    • wildcard
    • regex_match function
    • regex_search function
    • regex_replace function
    • Sample program

Wildcard

Some simple wildcards:

+: Match one or more times;
*: Match 0 or more times;
. matches any character except newline characters
\w matches letters or numbers or underscores or Chinese characters. Equivalent to [^A-Za-z0-9_]’.
\s matches any whitespace character
\d matches numbers and needs to be escaped \d
\b matches the beginning or end of a word
^ matches the beginning of a string
$ matches the end of the string

escape character

cout << regex_match("123", regex("\d + ")) << endl; //The result is 0, the escape character '' is required
cout << regex_match("123", regex("\d + ")) << endl; //The result is 1, a complete match

Boost C++’s regular expression library Boost.Regex can apply regular expressions to C++. Regular expressions greatly reduce the burden of searching for specific pattern strings and are a powerful feature in many languages. Although C++ still needs to provide this functionality in the form of the Boost C++ library, in the future regular expressions will enter the C++ standard library. The Boost.Regex library is expected to be included in the next version of the C++ standard.

The two most important classes in the Boost.Regex library are boost::regex and boost::smatch, both of which are defined in the boost/regex.hpp file. The former is used to define a regular expression, while the latter can save search results.

The following will introduce three functions provided in the Boost.Regex library to search for regular expressions.

regex_match function

The function boost::regex_match() is used to compare strings with regular expressions. The return value is true when the entire string matches the regular expression.

#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main()
{<!-- -->
  std::locale::global(std::locale("German"));
  std::string s = "Boris Sch?ling";
  boost::regex expr("\w + \s\w + ");
  std::cout << boost::regex_match(s, expr) << std::endl;
}

regex_search function

The function boost::regex_search() can be used to search for a regular expression in a string.

#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main()
{<!-- -->
  std::locale::global(std::locale("German"));
  std::string s = "Boris Sch?ling";
  boost::regex expr("(\w + )\s(\w + )");
  boost::smatch what;
  if (boost::regex_search(s, what, expr))
  {<!-- -->
    std::cout << what[0] << std::endl;
    std::cout << what[1] << " " << what[2] << std::endl;
  }
}

The function boost::regex_search() can accept a reference parameter of type boost::smatch to store the result. The function boost::regex_search() is only used for categorical searches. This example actually returns two results, which are grouped based on regular expressions.

The class boost::smatch that stores the result is actually a container holding elements of type boost::sub_match, accessible through an interface similar to class std::vector. For example, elements can be accessed through the operator operator.

On the other hand, class boost::sub_match holds iterators at positions corresponding to regular expression groupings. Because it inherits from class std::pair, the substring referenced by the iterator can be accessed using first and second. If, as in the above example, you only write the substring to the standard output stream, you can do this directly by overloading the operator <<, and there is no need to access the iterator.

Note that the results are stored in the iterator and the boost::sub_match class does not copy them, which means they are only accessible if the relevant string referenced by the iterator exists.

Also, note that the first element of the container boost::smatch stores a reference to the entire string matching the regular expression, and the first substring matching the first group is accessed by index 1.

regex_replace function

The third function provided by Boost.Regex is boost::regex_replace(). The boost::regex_replace() function also requires a format parameter, which determines how substrings and groups matching the regular expression are replaced. If the regular expression does not contain any grouping, the relevant substrings will be replaced one by one using the given format. In this way, the output result of the following program is Boris_Sch?ling. The boost::regex_replace() function always searches the entire string for a regular expression, so this program actually replaces all three spaces with underscores.

#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main()
{<!-- -->
  std::locale::global(std::locale("German"));
  std::string s = " Boris Sch?ling ";
  boost::regex expr("\s");
  std::string fmt("_");
  std::cout << boost::regex_replace(s, expr, fmt) << std::endl;
}


#include <boost/regex.hpp>
#include <locale>
#include <iostream>

int main()
{<!-- -->
  std::locale::global(std::locale("German"));
  std::string s = "Boris Sch?ling";
  boost::regex expr("(\w + )\s(\w + )");
  std::string fmt("\2 \1");
  std::cout << boost::regex_replace(s, expr, fmt) << std::endl;
}

The format parameter can access substrings grouped by regular expressions. This example uses this technique to swap the positions of the first and last names, so the result is displayed as Sch?ling Boris.

Example program

#include <boost/regex.hpp>
#include <iostream>
#include <string>
#include <cstdlib>
#include <stdlib.h>


boost::regex g_subexp("e[cl][oe][mc]");
boost::regex g_expr("^select ([a-zA-Z]*)\sfrom\s([a-zA-Z]*)\ssse");

int Test()
{<!-- -->
boost::cmatch what;
    std::string content = "select name from table sse" ;
boost::cmatch sub;

if (boost::regex_match(content.c_str(), what, g_expr))
{<!-- -->
//regex_match: Match the entire input block. If the entire block does not match, it will not succeed.
       std::cout << "boost::cmatch size: " << what.size() << std::endl;
for(unsigned int i = 0; i < what.size(); i + + )
std::cout << "str: " << what[i].str() << std::endl;
}
else
{<!-- -->
std::cout << "Error Match" << std::endl;
}
printf("%s\
",content.c_str());
while (boost::regex_search(content.c_str(), sub, g_subexp))
{<!-- -->
//Single word search, output the results matched each time
printf("%s\
", sub.base());
printf("%s\
", sub[0].str().c_str());
content = sub[0].second;
        std::cout << "content: " << content << std::endl;
}
return 0;
}


int main(int argc, char *argv[])
{<!-- -->
    {<!-- -->
        std::string s = "Boris Schaling";
        boost::regex expr("\w + \s\w + ");
        std::cout << boost::regex_match(s, expr) << std::endl;
    }

    {<!-- -->
        std::string s = "hello world";
        // In the matching rules, groups are divided by brackets (). The rules in the example have two brackets, so there are two sets of data.
        boost::regex expr("(\w + )\s(\w + )");
        boost::smatch what;
        if (boost::regex_search(s, what, expr))
        {<!-- -->
            std::cout << what[0] << std::endl;
            std::cout << what[1] << " " << what[2] << std::endl;
        }
    }

    {<!-- -->
        std::string s = " Boris Schaling ";
        boost::regex expr("\s");
        std::string fmt("_");
        std::cout << boost::regex_replace(s, expr, fmt) << std::endl;
    }

    Test();

    return 0;
}
/*
output:
1
hello world
hello world
_Boris_Schaling_
boost::cmatch size: 3
str: select name from table sse
str: name
str: table
select name from table sse
select name from table sse
elec
content: t name from table sse
*/