regex

regex is a big subject. You can buy entire books about regex, and if you are relatively young you will probably live long enough to finish reading one. In a language as exactly specified as C++, any regex implementation will necessarily be complex because of length alone. Note that there is a good discussion of the essentials of the TR1 implementation on John Cook's site. We will cover some of the same ground.

Fortunately, the functionality is available via a single include:

#include <regex>

For many (most?) uses, there are three things to get done before you can use the core functions, and there are two or three functions in that core that will do most of the heavy lifting.

  1. Put the text to be searched in an iter-able container (usually a std::string).
  2. Put the regex into a regex container.
  3. Figure out what options you want to use.

The first two steps are easy. Let's find the vowels in my name:

std::string s("George Kelly Flanagin"); // an iter-able container.
std::regex  r("[aeiouy]");  

Naturally, there are a number of switches that can be passed to the functions that make up the public interface of the library, and they more or less tell the tale of what can be done. The switches are all constants that are a part of the regex::regex_constants namespace. These are the syntax_option_type, and you and your team should probably agree on a default set to avoid a maintenance nightmare.

icase      : ignore case.
nosubs     : forget about capturing subexpressions that match.
optimize   : aim for speed of matches rather that construction of the matches.
collate    : pay attention to locale when using ranges like [a-f]
ECMAScript : Javascript style syntax for the expressions, themselves. (This is
             the default, btw).
basic      : basic POSIX regex syntax.
extended   : extended POSIX regex syntax.
awk        : POSIX awk utility syntax.
grep       : POSIX grep utility syntax.
egrep      : POSIX grep utility syntax you get from the "-e" option.

We (optionally) use some of the above parameters in the examination/matching/search. It is important to keep in mind that these constants are implementation defined, so always use the symbols rather than the values of these symbols.

And now we come to the functions. The two that do the finding are regex_search and regex_match. If you guessed that they are available in many flavors of overloads, you are correct -- the overload definitions fill several pages of the standard. There is also regex_replace which allows you to duplicate the functionality of editors like sed and vi within your program.