Command Metacharacters

Open Parser supports the standard set of Java RegEx character class metacharacters in the %Tokenize and @RegEx commands. A metacharacter is a character that carries special meaning in pattern matching. The supported metacharacters are:

([{\^-$|]})?*+.

There are two ways to force a metacharacter to be treated as an ordinary character:

  • Precede the metacharacter with a backslash
  • Enclose it within \Q (which starts the quote) and \E (which ends it).

%Tokenize follows the rule for Java Regular Expressions character classes—not Java Regular Expressions as a whole.

In general, the reserved characters for a character set are:

  • '[' and ']' indicate another set.
  • '-' is a metacharacter if in between two other characters.
  • '^' is a metacharacter if it is the first character in a set.
  • '&&' are metacharacters if they are between two other characters.
  • '\' means next that the character is a literal.

If you have any doubt whether a character will be treated as a metacharacter and you want the character to be treated as a literal, escape that character using the backlash.