I'm trying to write a very simple parser in C#.
I need a lexer -- something that lets me associate regular expressions with tokens, so it reads in regexs and gives me back symbols.
It seems like I ought to be able to use Regex to do the actual heavy lifting, but I can't see an easy way to do it. For one thing, Regex only seems to work on strings, not streams (why is that!?!?).
Basically, I want an implementation of the following interface:
interface ILexer : IDisposable
{
/// <summary>
/// Return true if there are more tokens to read
/// </summary>
bool HasMoreTokens { get; }
/// <summary>
/// The actual contents that matched the token
/// </summary>
string TokenContents { get; }
/// <summary>
/// The particular token in "tokenDefinitions" that was matched (e.g. "STRING", "NUMBER", "OPEN PARENS", "CLOSE PARENS"
/// </summary>
object Token { get; }
/// <summary>
/// Move to the next token
/// </summary>
void Next();
}
interface ILexerFactory
{
/// <summary>
/// Create a Lexer for converting a stream of characters into tokens
/// </summary>
/// <param name="reader">TextReader that supplies the underlying stream</param>
/// <param name="tokenDefinitions">A dictionary from regular expressions to their "token identifers"</param>
/// <returns>The lexer</returns>
ILexer CreateLexer(TextReader reader, IDictionary<string, object> tokenDefinitions);
}
So, pluz send the codz...
No, seriously, I am about to start writing an implementation of the above interface yet I find it hard to believe that there isn't some simple way of doing this in .NET (2.0) already.
So, any suggestions for a simple way to do the above? (Also, I don't want any "code generators". Performance is not important for this thing and I don't want to introduce any complexity into the build process.)