Splitting a string according to a delimiter when elements in the string can contain the delimiter
Posted
by Vivin Paliath
on Stack Overflow
See other posts from Stack Overflow
or by Vivin Paliath
Published on 2010-03-17T20:16:54Z
Indexed on
2010/03/17
20:21 UTC
Read the original article
Hit count: 425
I have a string that looks like this:
"#Text() #SomeMoreText() #TextThatContainsDelimiter(#blah) #SomethingElse()"
I'd like to get back
[#Text(), #SomeMoreText(), #TextThatContainsDelimiter(#blah), #SomethingElse()]
One way I thought about doing this was to require that the #
to be escaped into \#
, which makes the input string:
"#Text() #SomeMoreText() #TextThatContainsDelimiter(\#blah) #SomethingElse()"
I can then split it using /[^\\]#/
which gives me:
[#Text(), SomeMoreText, TextThatContainsDelimiter(\#blah), SomethingElse()]
The first element will contain #
but I can strip it out. However, is there a cleaner way to do this without having to escape the #
, and which ensures that the first element will not contain a #
? Basically I'd like it to split by #
only if the #
is not enclosed by parentheses.
My hunch is that since the #
is context-sensitive and and regular expressions are only suited for context-free strings, this may not be the right tool. If so, would I have to write a grammar for this and roll my own parser/lexer?
© Stack Overflow or respective owner