Is it possible to create a single tokenizer to parse this?
Posted
by
Adrian
on Programmers
See other posts from Programmers
or by Adrian
Published on 2013-05-16T16:22:47Z
Indexed on
2013/10/29
10:18 UTC
Read the original article
Hit count: 372
This extends off this other Q&A thread, but is going into details that are out of scope from the original question.
I am generating a parser that is to parse a context-sensitive grammar which can take in the following subset of symbols:
,
, [
, ]
, {
, }
, m/[a-zA-Z_][a-zA-Z_0-9]*/
, m/[0-9]+/
The grammar can take in the following string { abc[1] }, }
and parse it as ({
, abc[1]
, },
}
).
Another example would be to take: { abc[1] [, }
and parse it as ({
, abc[1]
, [,
, }
).
This is similar to the grammar used in Perl for the qw() syntax. The braces indicate that the contents are to be whitespace tokenized. A closing brace must be on its own to indicate the end of the whitespace tokenized group. Can this be done using a single lexer/tokenizer, or would it be necessary to have a separate tokenizer when parsing this group?
© Programmers or respective owner