Using Regular Expressions Metacharacters with .net – Comments and Mode Modifiers
With many flavors, the regex modes and match modes described earlier can be modified within the regex by the following constructs.
Mode modifier: (?modifier), such as (?i) or (?-i)
Many flavors now allow some of the regex and match modes to be set within the regular expression itself. A common example is the special notation (?i), which turns on case-insensitive matching, and (?-i), which turns it off. For example, <B>(?i)very(?-i)</B> has the very part match with case insensitivity, while still keeping the tag names case-sensitive. This matches ‘<B>VERY</B>’ and ‘<B>Very</B>’, for example, but not ‘<b>Very</b>’.
This example works with most systems that support (?i), including Perl, PHP, java.util.regex, Ruby,[] and the .NET languages. It doesn’t work with Python or Tcl, neither of which support (?-i).
With most implementations except Python, the effects of (?i) within any type of parentheses are limited by the parentheses (that is, turn off at the closing parentheses). So, the (?-i) can be eliminated by wrapping the case-insensitive part in parentheses and putting (?i) as the first thing inside: <B>(?:(?i)very)</B>.
The mode-modifier constructs support more than just ‘i’. With most systems, you can use at least those shown in Table 1. Some systems have additional letters for additional functions. PHP, in particular, offers quite a few extra, as does Tcl (see its documentation).
| Table 1: Common Mode Modifiers Letter | ||
| Regex | RegexDefinition ( or Use) | |
| i | case-insensitivity match mode | |
| x | free-spacing and comments regex mode | |
| s | dot-matches-all match mode | |
| m | enhanced line-anchor match mode | |
Mode-modified span: (?modifier
, such as (?i:)
The example from the previous section can be made even simpler for systems that support a mode-modified span. Using a syntax like (?i:), a mode-modified span turns on the mode only for what’s matched within the parentheses. Using this, the <B>(?:(?i)very)</B> example is simplified to <B>(?i:very)</B>. When supported, this form generally works for all mode-modifier letters the system supports. Tcl and Python are two examples that support the (?i) form, but not the mode-modified span (?i:) form.
Comments: (?#)and #
Some flavors support comments via (?#). In practice, this is rarely used, in favor of the free-spacing and comments regex mode. However, this type of comment is particularly useful in languages for which it’s difficult to get a newline into a string literal, such as VB.NET.
Literal-text span: \Q\E
First introduced with Perl, the special sequence \Q\E turns off all regex metacharacters between them, except for \E itself. (If the \E is omitted, they are turned off until the end of the regex.) It allows what would otherwise be taken as normal metacharacters to be treated as literal text. This is especially useful when including the contents of a variable while building a regular expression.
For example, to respond to a web search, you might accept what the user types as $query, and search for it with m/$query/i. As it is, this would certainly have unexpected results if $query were to contain, say, ‘C:\WINDOWS\’, which results in a run-time error because the search term contains something that isn’t a valid regular expression (the trailing lone backslash).
\Q\E avoids the problem. With the Perl code m/\Q$query\E/i, a $query of ‘C:\WINDOWS\’ becomes C\:\\WINDOWS\\, resulting in a search that finds the original ‘C:\WINDOWS\’ as the user expects.
This feature is less useful in systems with procedural and object-oriented handling, as they accept normal strings. While building the string to be used as a regular expression, it’s fairly easy to call a function to make the value from the variable "safe" for use in a regular expression. In VB, for example, one would use the Regex.Escape method; PHP has the preg_quote function; Java has a quote method.
The only regex engines that I know of that support \Q\E are java.util.regex and PCRE (and hence also PHP’s preg suite). Considering that I just mentioned that this was introduced with Perl (and I gave an example in Perl), you might wonder why I don’t include Perl in the list. Perl supports \Q\E within regex literals (regular expressions appearing directly in the program), but not within the contents of variables that might be interpolated into them. See Chapter 7 for details.





[...] Using Regular Expressions Metacharacters with .net – Comments and Mode Modifiers [...]