Make a Perl-style regex interpreter behave like a basic or extended regex interpreter

Posted by Barry Brown on Stack Overflow See other posts from Stack Overflow or by Barry Brown
Published on 2008-10-22T20:56:09Z Indexed on 2010/05/13 18:24 UTC
Read the original article Hit count: 1304

Filed under:
|
|

I am writing a tool to help students learn regular expressions. I will probably be writing it in Java.

The idea is this: the student types in a regular expression and the tool shows which parts of a text will get matched by the regex. Simple enough.

But I want to support several different regex "flavors" such as:

  • Basic regular expressions (think: grep)
  • Extended regular expressions (think: egrep)
  • A subset of Perl regular expressions, including the character classes \w, \s, etc.
  • Sed-style regular expressions

Java has the java.util.Regex class, but it supports only Perl-style regular expressions, which is a superset of the basic and extended REs. What I think I need is a way to take any given regular expression and escape the meta-characters that aren't part of a given flavor. Then I could give it to the Regex object and it would behave as if it was written for the selected RE interpreter.

For example, given the following regex:

^\w+[0-9]{5}-(\d{4})?$

As a basic regular expression, it would be interpreted as:

^\\w\+[0-9]\{5\}-\(\\d\{4\}\)\?$

As an extended regular expression, it would be:

^\\w+[0-9]{5}-(\\d{4})?$

And as a Perl-style regex, it would be the same as the original expression.

Is there a "regular expression for regular expressions" than I could run through a regex search-and-replace to quote the non-meta characters? What else could I do? Are there alternative Java classes I could use?

© Stack Overflow or respective owner

Related posts about regex

Related posts about java