Tokenizing numbers for a parser
Posted
by René Nyffenegger
on Stack Overflow
See other posts from Stack Overflow
or by René Nyffenegger
Published on 2010-06-11T12:06:35Z
Indexed on
2010/06/11
12:12 UTC
Read the original article
Hit count: 241
parser
|tokenizing
I am writing my first parser and have a few questions conerning the tokenizer.
Basically, my tokenizer exposes a nextToken()
function that is supposed to return the next token. These tokens are distinguished by a token-type. I think it would make sense to have the following token-types:
- SYMBOL (such as
<
,:=
,(
and the like - REMARK (or a comment)
- NUMBER
- IDENT (such as the name of a function or a variable)
- STRING (Something enclosed between "....")
Now, do you think this makes sense?
Also, I am struggling with the NUMBER
token-type. Do you think it makes more sense to further split it up into a NUMBER
and a FLOAT
token-type? Without a FLOAT
token-type, I'd receive NUMBER
(eg 402), a SYMBOL
(.) followed by another NUMBER
(eg 203) if I were about to parse a float.
Finally, what do you think makes more sense for the tokenizer to return when it encounters a -909
? Should it return the SYMBOL
-
first, followed by the NUMBER
909
or should it return a NUMBER
-909
right away?
© Stack Overflow or respective owner