How to implement string matching based on a pattern

Posted by Vincent Rischmann on Programmers See other posts from Programmers or by Vincent Rischmann
Published on 2012-10-19T22:48:47Z Indexed on 2012/10/19 23:19 UTC
Read the original article Hit count: 184

Filed under:
|
|

I was asked to build a tool that can identify if a string match a pattern.

Example:

{1:20} stuff t(x) {a,b,c}

would match:

  • 1 stuff tx a
  • 20 stuff t c

It is a sort of regex but with a different syntax

  • Parentheses indicate an optional value
  • {1:20} is a interval; I will have to check if the token is a number and if it is between 1 and 20
  • {a,b,c} is just an enumeration; it can be either a or b or c

Right now I implemented this with a regex, and the interval stuff was a pain to do.

On my own time I tried implementing some kind of matcher by hand, but it turns out it's not that easy to do.

By experimenting I ended up with a function that generates a state table from the pattern and a state machine. It worked well until I tried to implement the optional value, and I got stuck and how to generate the state table.

After that I searched how I could do this, and that led me to stuff like LL parser, LALR parser, recursive-descent parser, context-free grammars, etc. I never studied any of this so it's hard to know what is relevant here, but I think this is what I need:

  • A grammar
  • A parser which generates states from the grammar and a pattern
  • A state machine to see if a string match the states

So my first question is: Is this right ? And second question, what do you recommend I read/study to be able to implement this ?

© Programmers or respective owner

Related posts about design

Related posts about strings