Regex for ignoring consecutive quotation marks in string

Posted by will-hart on Stack Overflow See other posts from Stack Overflow or by will-hart
Published on 2014-06-13T02:14:13Z Indexed on 2014/06/13 3:25 UTC
Read the original article Hit count: 144

Filed under:
|
|

I have built a parser in Sprache and C# for files using a format I don't control. Using it I can correctly convert:

a = "my string";

into

my string

The parser (for the quoted text only) currently looks like this:

public static readonly Parser<string> QuotedText =
    from open in Parse.Char('"').Token()
    from content in Parse.CharExcept('"').Many().Text().Token()
    from close in Parse.Char('"').Token()
    select content;

However the format I'm working with escapes quotation marks using "double doubles" quotes, e.g.:

a = "a ""string"".";

When attempting to parse this nothing is returned. It should return:

a ""string"".

Additionally

a = "";

should be parsed into a string.Empty or similar.

I've tried regexes unsuccessfully based on answers like this doing things like "(?:[^;])*", or:

public static readonly Parser<string> QuotedText =
    from content in Parse.Regex("""(?:[^;])*""").Token()

This doesn't work (i.e. no matches are returned in the above cases). I think my beginners regex skills are getting in the way. Does anybody have any hints?

EDIT: I was testing it here - http://regex101.com/r/eJ9aH1

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex