Extracting useful information from free text

Posted by insta on Programmers See other posts from Programmers or by insta
Published on 2012-12-04T22:37:01Z Indexed on 2012/12/05 5:29 UTC
Read the original article Hit count: 190

Filed under:

We filter and analyse seats for events. Apparently writing a domain query language for the floor people isn't an option. I'm using C# 4.0 & .NET 4.0, and have relatively free reign to use whatever open-source tools are available. </background-info>

If a request comes in for "FLOOR B", the sales people want it to show up if they've entered "FLOOR A-FLOOR F" in a filter. The only problem I have is that there's absolutely no structure to the parsed parameters. I get the string already concatenated (it actually uses a tilde instead of dash). Examples I've seen so far with matches after each:

  • 101WC-199WC (needs to match 150WC)
  • AAA-ZZZ (needs to match AAA, BBB, ABC but not BB)
  • LOGE15-LOGE20 (needs to match LOGE15 but not LOGE150)

At first I wanted to try just stripping off the numeric part of the lower and upper, and then incrementing through that. The problem I have is that only some entries have numbers, sometimes the numbers AND letters increment, sometimes its all letters that increment. Since I can't impose any kind of grammar to use (I really wanted [..] expansion syntax), I'm stuck using these entries.

Are there any suggestions for how to approach this parsing problem?

© Programmers or respective owner

Related posts about parsing