Extracting useful information from free text
- by insta
We filter and analyse seats for events. Apparently writing a domain query language for the floor people isn't an option. I'm using C# 4.0 & .NET 4.0, and have relatively free reign to use whatever open-source tools are available. </background-info>
If a request comes in for "FLOOR B", the sales people want it to show up if they've entered "FLOOR A-FLOOR F" in a filter. The only problem I have is that there's absolutely no structure to the parsed parameters. I get the string already concatenated (it actually uses a tilde instead of dash). Examples I've seen so far with matches after each:
101WC-199WC (needs to match 150WC)
AAA-ZZZ (needs to match AAA, BBB, ABC but not BB)
LOGE15-LOGE20 (needs to match LOGE15 but not LOGE150)
At first I wanted to try just stripping off the numeric part of the lower and upper, and then incrementing through that. The problem I have is that only some entries have numbers, sometimes the numbers AND letters increment, sometimes its all letters that increment. Since I can't impose any kind of grammar to use (I really wanted [..] expansion syntax), I'm stuck using these entries.
Are there any suggestions for how to approach this parsing problem?