Make a Perl-style regex interpreter behave like a basic or extended regex interpreter
- by Barry Brown
I am writing a tool to help students learn regular expressions. I will probably be writing it in Java.
The idea is this: the student types in a regular expression and the tool shows which parts of a text will get matched by the regex. Simple enough.
But I want to support several different regex "flavors" such as:
Basic regular expressions (think: grep)
Extended regular expressions (think: egrep)
A subset of Perl regular expressions, including the character classes \w, \s, etc.
Sed-style regular expressions
Java has the java.util.Regex class, but it supports only Perl-style regular expressions, which is a superset of the basic and extended REs. What I think I need is a way to take any given regular expression and escape the meta-characters that aren't part of a given flavor. Then I could give it to the Regex object and it would behave as if it was written for the selected RE interpreter.
For example, given the following regex:
^\w+[0-9]{5}-(\d{4})?$
As a basic regular expression, it would be interpreted as:
^\\w\+[0-9]\{5\}-\(\\d\{4\}\)\?$
As an extended regular expression, it would be:
^\\w+[0-9]{5}-(\\d{4})?$
And as a Perl-style regex, it would be the same as the original expression.
Is there a "regular expression for regular expressions" than I could run through a regex search-and-replace to quote the non-meta characters? What else could I do? Are there alternative Java classes I could use?