Big problem with regular expression in Lex (lexical analyzer)
Posted
by Nazgulled
on Stack Overflow
See other posts from Stack Overflow
or by Nazgulled
Published on 2010-03-26T23:35:42Z
Indexed on
2010/03/26
23:43 UTC
Read the original article
Hit count: 525
Hi,
I have some content like this:
author = "Marjan Mernik and Viljem Zumer",
title = "Implementation of multiple attribute grammar inheritance in the tool LISA",
year = 1999
author = "Manfred Broy and Martin Wirsing",
title = "Generalized
Heterogeneous Algebras and
Partial Interpretations",
year = 1983
author = "Ikuo Nakata and Masataka Sassa",
title = "L-Attributed LL(1)-Grammars are
LR-Attributed",
journal = "Information Processing Letters"
And I need to catch everything between double quotes for title. My first try was this:
^(" "|\t)+"title"" "*=" "*"\"".+"\","
Which catches the first example, but not the other two. The other have multiple lines and that's the problem. I though about changing to something with \n
somewhere to allow multiple lines, like this:
^(" "|\t)+"title"" "*=" "*"\""(.|\n)+"\","
But this doesn't help, instead, it catches everything.
Than I though, "what I want is between double quotes, what if I catch everything until I find another "
followed by ,
? This way I could know if I was at the end of the title or not, no matter the number of lines, like this:
^(" "|\t)+"title"" "*=" "*"\""[^"\""]+","
But this has another problem... The example above doesn't have it, but the double quote symbol ("
) can be in between the title declaration. For instance:
title = "aaaaaaa \"X bbbbbb",
And yes, it will always be preceded by a backslash (\
).
Any suggestions to fix this regexp?
© Stack Overflow or respective owner