Big problem with regular expression in Lex (lexical analyzer)
- by Nazgulled
Hi,
I have some content like this:
author = "Marjan Mernik and Viljem Zumer",
title = "Implementation of multiple attribute grammar inheritance in the tool LISA",
year = 1999
author = "Manfred Broy and Martin Wirsing",
title = "Generalized
Heterogeneous Algebras and
Partial Interpretations",
year = 1983
author = "Ikuo Nakata and Masataka Sassa",
title = "L-Attributed LL(1)-Grammars are
LR-Attributed",
journal = "Information Processing Letters"
And I need to catch everything between double quotes for title. My first try was this:
^(" "|\t)+"title"" "*=" "*"\"".+"\","
Which catches the first example, but not the other two. The other have multiple lines and that's the problem. I though about changing to something with \n somewhere to allow multiple lines, like this:
^(" "|\t)+"title"" "*=" "*"\""(.|\n)+"\","
But this doesn't help, instead, it catches everything.
Than I though, "what I want is between double quotes, what if I catch everything until I find another " followed by ,? This way I could know if I was at the end of the title or not, no matter the number of lines, like this:
^(" "|\t)+"title"" "*=" "*"\""[^"\""]+","
But this has another problem... The example above doesn't have it, but the double quote symbol (") can be in between the title declaration. For instance:
title = "aaaaaaa \"X bbbbbb",
And yes, it will always be preceded by a backslash (\).
Any suggestions to fix this regexp?