How to implement string matching based on a pattern
- by Vincent Rischmann
I was asked to build a tool that can identify if a string match a pattern.
Example:
{1:20} stuff t(x) {a,b,c}
would match:
1 stuff tx a
20 stuff t c
It is a sort of regex but with a different syntax
Parentheses indicate an optional value
{1:20} is a interval; I will have to check if the token is a number and if it is between 1 and 20
{a,b,c} is just an enumeration; it can be either a or b or c
Right now I implemented this with a regex, and the interval stuff was a pain to do.
On my own time I tried implementing some kind of matcher by hand, but it turns out it's not that easy to do.
By experimenting I ended up with a function that generates a state table from the pattern and a state machine. It worked well until I tried to implement the optional value, and I got stuck and how to generate the state table.
After that I searched how I could do this, and that led me to stuff like LL parser, LALR parser, recursive-descent parser, context-free grammars, etc. I never studied any of this so it's hard to know what is relevant here, but I think this is what I need:
A grammar
A parser which generates states from the grammar and a pattern
A state machine to see if a string match the states
So my first question is: Is this right ? And second question, what do you recommend I read/study to be able to implement this ?