How do I best do balanced quoting with Perl's Regexp::Grammars?

Posted by Evan Carroll on Stack Overflow See other posts from Stack Overflow or by Evan Carroll
Published on 2010-06-15T16:34:20Z Indexed on 2010/06/15 22:12 UTC
Read the original article Hit count: 281

Filed under:
|
|
|

Using Damian Conway's Regexp::Grammars, I'm trying to match different balanced quoting ('foo', "foo", but not 'foo") mechanisms -- such as parens, quotes, double quotes, and double dollars. This is the code I'm currently using.

<token: pair>        \'<literal>\'|\"<literal>\"|\$\$<literal>\$\$
<token: literal>    [\S]+

This generally works fine and allows me to say something like:

<rule: quote>            QUOTE <.as>? <pair>

My question is how do I reform the output, to exclude the needles notation for the pair token?

{
  '' => 'QUOTE AS \',\'',
  'quote' => {
               '' => 'QUOTE AS \',\'',
               'pair' => {
                           'literal' => ',',
                           '' => '\',\''
                         }
             }
},

Here, there is obviously no desire to have pair in between, quote, and the literal value of it. Is there a better way to match 'foo', "foo", and $$foo$$, and maybe sometimes ( foo ) without each time creating a needless pair token? Can I preprocess-out that token or fold it into the above? Or, write a better construct entirely that eliminates the need for it?

© Stack Overflow or respective owner

Related posts about regex

Related posts about perl