How to transform a csv to combine matching rows?
- by Christian Wolf
I have a CSV file with some transaction data. Let's say date, volume, price and direction (sell/buy). Additionally there is a ID for each transaction and on each closing transaction (the newer one) there is a reference to the corresponding transaction. Classical database referencing.
Now I want to do some statistics and draw some plots. This could be done via Octave, LaTeX/TikZ, Gnuplot or whatever. To do this I need both buy and sell price in one row. My thought was to preprocess the CSV to get another CSV containing the needed information and then to do the statistics.
In the end I'd like to have a solution based on scripts and not on a spreadsheet as data might change often (exported from online DB).
My actual solution (see http://paste.ubuntu.com/6262822/ ) is a bash script that parses the CSV line by line and checks if there exists a corresponding transaction. If found, a new row is written to the destination CSV. If not a warning is printed.
The bad news: For each row in the source file I have to read the whole file a few times. This causes long running times of 10sec for 300 lines. As the line number might rise soon (10k lines), this is not perfect.
I am aware, that there are many shells to be opened in the script which might cause the performance problems.
Now my questions:
Is bash/awk/sed/.... a good way to do things?
Should I first import all data into a "real" local database to use SQL?
Is there an easy way to achieve the desired results?