How do I create something like a negated character class with a string instead of characters?

Posted by Chas. Owens on Stack Overflow See other posts from Stack Overflow or by Chas. Owens
Published on 2010-03-17T20:50:06Z Indexed on 2010/03/17 23:51 UTC
Read the original article Hit count: 308

Filed under:
|

I am trying to write a tokenizer for Mustache in Perl. I can easily handle most of the tokens like this:

#!/usr/bin/perl

use strict;
use warnings;

my $comment  = qr/ \G \{\{ !  (?<comment>  .+? ) }}              /xs; 
my $variable = qr/ \G \{\{    (?<variable> .+? ) }}              /xs; 
my $text     = qr/ \G         (?<text>     .+? ) (?= \{\{ | \z ) /xs; 
my $tokens   = qr/ $comment | $variable | $text /x;

my $s = do { local $/; <DATA> };

while ($s =~ /$tokens/g) {
    my ($type)    = keys %+;
    (my $contents = $+{$type}) =~ s/\n/\\n/;

    print "type [$type] contents [$contents]\n";
}

__DATA__
{{!this is a comment}}
Hi {{name}}, I like {{thing}}.

But I am running into trouble with the Set Delimiters directive:

#!/usr/bin/perl

use strict;
use warnings;

my $delimiters = qr/ \G \{\{    (?<start> .+? ) = [ ] = (?<end> .+?) }} /xs; 
my $comment    = qr/ \G \{\{ !  (?<comment>  .+? ) }}                   /xs; 
my $variable   = qr/ \G \{\{    (?<variable> .+? ) }}                   /xs; 
my $text       = qr/ \G         (?<text>     .+? ) (?= \{\{ | \z )      /xs; 
my $tokens     = qr/ $comment | $delimiters | $variable | $text /x;

my $s = do { local $/; <DATA> };

while ($s =~ /$tokens/g) {
    for my $type (keys %+) {
        (my $contents = $+{$type}) =~ s/\n/\\n/;

        print "type [$type] contents [$contents]\n";
    }
}

__DATA__
{{!this is a comment}}
Hi {{name}}, I like {{thing}}.
{{(= =)}}

If I change it to

my $delimiters = qr/ \G \{\{ (?<start> [^{]+? ) = [ ] = (?<end> .+?) }} /xs;

It works fine, but the point of the Set Delimiters directive is to change the delimiters, so the code will wind up looking like

my $variable = qr/ \G $start (?<variable> .+? ) $end /xs;

And it is perfectly valid to say {{{== ==}}} (i.e. change the delimiters to {= and =}). What I want, but maybe not what I need, is the ability to say something like (?:not starting string)+?. I figure I am just going to have to give up being clean about it and drop code into the regex to force it to match only what I want. I am trying to avoid that for four reasons:

  1. I don't think it is very clean.
  2. It is marked as experimental.
  3. I am not very familier with it (I think it comes down to (?{CODE}) and returning special values.
  4. I am hoping someone knows some other exotic feature that I am not familiar with that fits the situation better (e.g. (?(condition)yes-pattern|no-pattern)).

Just to make things clear (I hope), I am trying to match a constant length starting delimiter followed by the shortest string that allows a match and does not contain the starting delimiter followed by a space followed by an equals sign followed by the shortest string that allows a match that ends with the ending delimiter.

© Stack Overflow or respective owner

Related posts about perl

Related posts about regex