Tokenizing numbers for a parser

Posted by René Nyffenegger on Stack Overflow See other posts from Stack Overflow or by René Nyffenegger
Published on 2010-06-11T12:06:35Z Indexed on 2010/06/11 12:12 UTC
Read the original article Hit count: 241

Filed under:
|

I am writing my first parser and have a few questions conerning the tokenizer.

Basically, my tokenizer exposes a nextToken() function that is supposed to return the next token. These tokens are distinguished by a token-type. I think it would make sense to have the following token-types:

  • SYMBOL (such as <, :=, ( and the like
  • REMARK (or a comment)
  • NUMBER
  • IDENT (such as the name of a function or a variable)
  • STRING (Something enclosed between "....")

Now, do you think this makes sense?

Also, I am struggling with the NUMBER token-type. Do you think it makes more sense to further split it up into a NUMBER and a FLOAT token-type? Without a FLOAT token-type, I'd receive NUMBER (eg 402), a SYMBOL (.) followed by another NUMBER (eg 203) if I were about to parse a float.

Finally, what do you think makes more sense for the tokenizer to return when it encounters a -909? Should it return the SYMBOL - first, followed by the NUMBER 909 or should it return a NUMBER -909 right away?

© Stack Overflow or respective owner

Related posts about parser

Related posts about tokenizing