best way to parse plain text file with a nested information structure

Posted by Beffa on Stack Overflow See other posts from Stack Overflow or by Beffa
Published on 2010-03-17T01:11:49Z Indexed on 2010/03/17 2:51 UTC
Read the original article Hit count: 516

Filed under:

ruby

|

parsing

|

treetop

|

regex

The text file has hundreds of these entries (format is MT940 bank statement)

{1:F01AHHBCH110XXX0000000000}{2:I940X           N2}{3:{108:XBS/091502}}{4:
:20:XBS/091202/0001
:25:5887/507004-50
:28C:140/1
:60F:C0914CHF7789,
:61:0912021202D36,80NTRFNONREF//0887-1202-29-941
04392579-0 LUTHY + xxx, ZUR
:86:6034?60LUTHY + xxxx, ZUR vom 01.12.09 um 16:28 Karten-Nr. 2232
2579-0
:62F:C091202CHF52,2
:64:C091302CHF52,2
-}

This should go into an Array of Hashes like

[{"1"=>"F01AHHBCH110XXX0000000000"},
  "2"=>"I940X           N2", 
   3 => {108=>"XBS/091502"}
etc.
} ]

I tried it with tree top, but it seemed not to be the right way, because it's more for something you want to do calculations on, and I just want the information.

grammar Mt940

  rule document
    part1:string spaces [:|/] spaces part2:document 
    {
      def eval(env={})
        return part1.eval, part2.eval
      end
    }
    / string
    /  '{' spaces document spaces '}' spaces
    {
      def eval(env={})
        return [document.eval]
      end
    }
  end
end

I also tried with a regular expression

matches = str.scan(/\A[{]?([0-9]+)[:]?([^}]*)[}]?\Z/i)

but it's difficult with recursion ...

How can I solve this problem?

© Stack Overflow or respective owner

Related posts about ruby

Setting up Rails to work with sqlserver

as seen on Stack Overflow - Search for 'Stack Overflow'
Ok I followed the steps for setting up ruby and rails on my Vista machine and I am having a problem connecting to the database. Contents of database.yml development: adapter: sqlserver database: APPS_SETUP Host: WindowsVT06\SQLEXPRESS Username: se Password: paswd Run rake db:migrate… >>> More
marshal data too short!!!

as seen on Stack Overflow - Search for 'Stack Overflow'
My application requires to keep large data objects in session. There are like 3-4 data objects each created by parsing a csv containing 150 X 20 cells having strings of 3-4 characters. My application shows this error- "marshal data too short". I tried this- Deleting the old session table. Deleting… >>> More
Sinatra and XML POST request

as seen on Stack Overflow - Search for 'Stack Overflow'
I don't know is it my mistake or no. So i have that code: <code> post '/singin/get_token' do content_type :xml puts request.body.read puts xmlRequest xmlRequest = REXML::Document.new(request.body.read) ... </code> And when i post something like that: <code> <?xml… >>> More
how to change ruby path from /usr/bin/ruby to /usr/local/bin/ruby

as seen on Stack Overflow - Search for 'Stack Overflow'
reading around the various ruby install tutorials it's required to change path from /usr/bin/ruby to /usr/local/bin/ruby but i cant seem to be able to do it. Ultimately i want to install Ruby 1.9.2, should i uninstall 1.8.7 or what? i tried to install Ruby 1.9.2 with macports, the installation seemed… >>> More
strange bundler error: tar_input.rb:49:in `initialize': not in gzip format (Zlib::GzipFile::Error) o

as seen on Stack Overflow - Search for 'Stack Overflow'
i am getting a strange bundler error when running bundle pack with bundler 0.9.12 any ideas? (see pastie for a better formatted code: http://pastie.org/881328 ) /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/site_ruby/1.8/rubygems/package/tar_input.rb:49:in `initialize': not in gzip format (Zlib::GzipFile::Error) … >>> More

Related posts about parsing

Hot to fix nautilus desktop on linux mint

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
so I'm using Linux Mint 13 with Cinnamon and suddenly there are no icons on the desktop and the right click doesn't work, it's like the desktop doesn't start up at all, but the Cinnamon interface and everything else are working just fine. This happens only when I open the session with Cinnamon, if… >>> More
Is parsing JSON faster than parsing XML

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm creating a sophisticated JavaScript library for working with my company's server side framework. The server side framework encodes its data to a simple XML format. There's no fancy namespacing or anything like that. Ideally I'd like to parse all of the data in the browser as JSON. However, if… >>> More
Looking for a tutorial on Recursive Descent Parsing.

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to parse some data to no success. Can anyone recommend a good introduction with a lot of examples to Recursive Descent Parsing? I haven't been able to find any. >>> More
Parsing XML with Hpricot, a Gem of a Ruby Gem

as seen on Internet.com - Search for 'Internet.com'
Need to parse complex XML documents but don't know where to begin? Leave the task to Ruby's powerful Hpricot library. >>> More
Parsing scripts that use curly braces

as seen on Programmers - Search for 'Programmers'
To get an idea of what I'm doing, I am writing a python parser that will parse directx .x text files. The problem I have deals with how the files are formatted. Although I'm writing it in python, I'm looking for general algorithms for dealing with this sort of parsing. .x files define data using… >>> More