String patterns that can be used to filter and group files

Posted by Louis Rhys on Programmers See other posts from Programmers or by Louis Rhys
Published on 2012-09-27T06:55:21Z Indexed on 2012/09/27 9:50 UTC
Read the original article Hit count: 337

One of our application filters files in certain directory, extract some data from it and export a document from the extracted data. The algorithm for extracting the data depends on the file, and so far we use regex to select the algorithm to be used, for example .*\.txt will be processed by algorithm A, foo[0-5]\.xml will be processed by algo B, etc.

However now we need some files to be processed together. For example, in one case we need two files, foo.*\.xml and bar.*\.xml. Part of the information to be extracted exist in the foo file, and the other part in the bar file. Moreover, we need to make sure the wild card is compatible. For example, if there are 6 files

foo1.xml
foo23.xml
bar1.xml
bar9.xml
bar23.xml
foo4.xml

I would expect foo1 and bar1 to be identified as a group, and foo23 and bar23 as another group. bar9 and foo4 has no pair, so they will not be treated.

Now, since the filter is configured by user, we need to have a pattern that can express the above requirement. I don't think you can express meaning like above in standard regex. (foo|bar).*\.xml will match all 6 file above and we can't identify which file is paired for a particular file.

Is there any standard pattern that can express it? Or any idea how to modify regex to support this, that can be implemented easily?

© Programmers or respective owner

Related posts about design

Related posts about algorithms