Regex for ignoring consecutive quotation marks in string
Posted
by
will-hart
on Stack Overflow
See other posts from Stack Overflow
or by will-hart
Published on 2014-06-13T02:14:13Z
Indexed on
2014/06/13
3:25 UTC
Read the original article
Hit count: 146
I have built a parser in Sprache and C# for files using a format I don't control. Using it I can correctly convert:
a = "my string";
into
my string
The parser (for the quoted text only) currently looks like this:
public static readonly Parser<string> QuotedText =
from open in Parse.Char('"').Token()
from content in Parse.CharExcept('"').Many().Text().Token()
from close in Parse.Char('"').Token()
select content;
However the format I'm working with escapes quotation marks using "double doubles" quotes, e.g.:
a = "a ""string"".";
When attempting to parse this nothing is returned. It should return:
a ""string"".
Additionally
a = "";
should be parsed into a string.Empty
or similar.
I've tried regexes unsuccessfully based on answers like this doing things like "(?:[^;])*"
, or:
public static readonly Parser<string> QuotedText =
from content in Parse.Regex("""(?:[^;])*""").Token()
This doesn't work (i.e. no matches are returned in the above cases). I think my beginners regex skills are getting in the way. Does anybody have any hints?
EDIT: I was testing it here - http://regex101.com/r/eJ9aH1
© Stack Overflow or respective owner