Parsing language for both binary and character files

Posted by Thorsten S. on Stack Overflow See other posts from Stack Overflow or by Thorsten S.
Published on 2010-03-10T23:26:16Z Indexed on 2010/03/11 17:24 UTC
Read the original article Hit count: 309

The problem: You have some data and your program needs specified input. For example strings which are numbers. You are searching for a way to transform the original data in a format you need. And the problem is: The source can be anything. It can be XML, property lists, binary which contains the needed data deeply embedded in binary junk. And your output format may vary also: It can be number strings, float, doubles....

You don't want to program. You want routines which gives you commands capable to transform the data in a form you wish. Surely it contains regular expressions, but it is very good designed and it offers capabilities which are sometimes much more easier and more powerful. Something like a super-grep which you can access (!) as program routines, not only as tool.

It allows:

  • joining/grouping/merging of results
  • inserting/deleting/finding/replacing
  • write macros which allows to execute a command chain repeatedly
  • meta-grouping (lists->tables->hypertables)

Example (No, I am not looking for a solution to this, it is just an example): You want to read xml strings embedded in a binary file with variable length records. Your tool reads the record length and deletes the junk surrounding your text. Now it splits open the xml and extracts the strings. Being Indian number glyphs and containing decimal commas instead of decimal points, your tool transforms it into ASCII and replaces commas with points. Now the results must be stored into matrices of variable length....etc. etc.

I am searching for a good language / language-design and if possible, an implementation. Which design do you like or even, if it does not fulfill the conditions, wouldn't you want to miss ?

EDIT: The question is if a solution for the problem exists and if yes, which implementations are available. You DO NOT implement your own sorting algorithm if Quicksort, Mergesort and Heapsort is available. You DO NOT invent your own text parsing method if you have regular expressions. You DO NOT invent your own 3D language for graphics if OpenGL/Direct3D is available. There are existing solutions or at least papers describing the problem and giving suggestions. And there are people who may have worked and experienced such problems and who can give ideas and suggestions. The idea that this problem is totally new and I should work out and implement it myself without background knowledge seems for me, I must admit, totally off the mark.

© Stack Overflow or respective owner

Related posts about parser

Related posts about language-agnostic