GridLab
Grid Application Toolkit

A simple API for Grid Applications
GAT

Menu



next up previous contents
Next: Appendix: GATTable Up: Appendix: Regular Expressions Previous: Regular Expressions in Formal   Contents

POSIX Regular Expression Syntax

In this syntax, most characters are treated as literals - they match only themselves ($a$ matches $a$, $abc$ matches $bc$, etc). The exceptions are called metacharacters:

  • . Matches any single character
  • [ ] Matches a single character that is contained within the brackets - $[abc]$ matches $a$, $b$, or $c$. $[a-z]$ matches any lowercase letter.
  • [^] Matches a single character that is not contained within the brackets - $[ \, \hat{\,\,} a-z]$ matches any single character that isn't a lowercase letter
  • ^ Matches the start of the line
  • $ Matches the end of the line
  • ( ) Mark a part of the expression. What the enclosed expression matched to can be recalled by \n where $n$ is a digit from 1 to 9.
  • \n Where $n$ is a digit from 1 to 9; matches to the exact string what the expression enclosed in the $n$th left parenthesis and its pairing right parenthesis has been matched to. This construct is theoretically irregular and has not adopted in the extended regular expression syntax.
  • * A single character expression followed by $*$ matches to zero or more iteration of the expression. For example, $[xyz]*$ matches to $\epsilon$, $x$, $y$, $zx$, $zyx$, and so on. A $\backslash n*$, where $n$ is a digit from 1 to 9, matches to zero or more iterations of the exact string that the expression enclosed in the $n$th left parenthesis and its pairing right parenthesis has been matched to. For example, $(a??) \backslash 1$ matches to $abcbc$ and $adede$ but not $abcde$. An expression enclosed in $($ and $)$ followed by $*$ is deemed to be invalid. In some cases (e.g. /usr/bin/xpg4/grep of SunOS 5.8), it matches to zero or more iteration of the same string which the enclose expression matches to. In other some cases (e.g. /usr/bin/grep of SunOS 5.8), it matches to what the enclose expression matches to, followed by a literal $*$.
  • {x,y} Match the last ``block'' at least $x$ and not more than $y$ times. - $a\{3,5\}$ matches $aaa$, $aaaa$ or $aaaaa$.
  • + Match the last ``block'' one or more times - $ba+$ matches $ba$, $baa$, $baaa$ and so on
  • ? Match the last ``block'' zero or more times - $ba?$ matches $b$ or $ba$
  • | The choice (or set union) operator: match either the expression before or the expression after the operator - $abc\vert def$ matches $abc$ or $def$.

Since the characters `(', `)', `[', `]', `.', `*', `?', `+', `^' and `$' are used as special symbols they have to be ``escaped'' somehow if they are meant literally. This is done by preceding them with `116"' which therefore also has to be ``escaped" this way if meant literally.


next up previous contents
Next: Appendix: GATTable Up: Appendix: Regular Expressions Previous: Regular Expressions in Formal   Contents
Andre Merzky 2004-05-13