How to transform a csv to combine matching rows?

Posted by Christian Wolf on Super User See other posts from Super User or by Christian Wolf
Published on 2013-10-19T10:24:49Z Indexed on 2013/11/02 9:58 UTC
Read the original article Hit count: 261

Filed under:
|
|

I have a CSV file with some transaction data. Let's say date, volume, price and direction (sell/buy). Additionally there is a ID for each transaction and on each closing transaction (the newer one) there is a reference to the corresponding transaction. Classical database referencing.

Now I want to do some statistics and draw some plots. This could be done via Octave, LaTeX/TikZ, Gnuplot or whatever. To do this I need both buy and sell price in one row. My thought was to preprocess the CSV to get another CSV containing the needed information and then to do the statistics. In the end I'd like to have a solution based on scripts and not on a spreadsheet as data might change often (exported from online DB).

My actual solution (see http://paste.ubuntu.com/6262822/ ) is a bash script that parses the CSV line by line and checks if there exists a corresponding transaction. If found, a new row is written to the destination CSV. If not a warning is printed.

The bad news: For each row in the source file I have to read the whole file a few times. This causes long running times of 10sec for 300 lines. As the line number might rise soon (>10k lines), this is not perfect. I am aware, that there are many shells to be opened in the script which might cause the performance problems.

Now my questions:

  • Is bash/awk/sed/.... a good way to do things?
  • Should I first import all data into a "real" local database to use SQL?
  • Is there an easy way to achieve the desired results?

© Super User or respective owner

Related posts about bash

Related posts about script