Splitting a string according to a delimiter when elements in the string can contain the delimiter

Posted by Vivin Paliath on Stack Overflow See other posts from Stack Overflow or by Vivin Paliath
Published on 2010-03-17T20:16:54Z Indexed on 2010/03/17 20:21 UTC
Read the original article Hit count: 433

I have a string that looks like this:

"#Text() #SomeMoreText() #TextThatContainsDelimiter(#blah) #SomethingElse()"

I'd like to get back

[#Text(), #SomeMoreText(), #TextThatContainsDelimiter(#blah), #SomethingElse()]

One way I thought about doing this was to require that the # to be escaped into \#, which makes the input string:

"#Text() #SomeMoreText() #TextThatContainsDelimiter(\#blah) #SomethingElse()"

I can then split it using /[^\\]#/ which gives me:

[#Text(), SomeMoreText, TextThatContainsDelimiter(\#blah), SomethingElse()]

The first element will contain # but I can strip it out. However, is there a cleaner way to do this without having to escape the #, and which ensures that the first element will not contain a #? Basically I'd like it to split by # only if the # is not enclosed by parentheses.

My hunch is that since the # is context-sensitive and and regular expressions are only suited for context-free strings, this may not be the right tool. If so, would I have to write a grammar for this and roll my own parser/lexer?

© Stack Overflow or respective owner

Related posts about regex

Related posts about regular-expressions