Data munging and data import scripting
Posted
by morpheous
on Stack Overflow
See other posts from Stack Overflow
or by morpheous
Published on 2010-05-14T10:10:37Z
Indexed on
2010/05/14
10:14 UTC
Read the original article
Hit count: 272
I need to write some scripts to carry out some tasks on my server (running Ubuntu server 8.04 TLS). The tasks are to be run periodically, so I will be running the scripts as cron jobs.
I have divided the tasks into "group A" and "group B" - because (in my mind at least), they are a bit different.
Task Group A
import data from a file and possibly reformat it - by reformatting, I mean doing things like santizing the data, possibly normalizing it and or running calculations on 'columns' of the data
Import the munged data into a database. For now, I am mostly using mySQL for the vast majority of imports - although some files will be imported into a sqlLite database.
Note: The files will be mostly text files, although some of the files are in a binary format (my own proprietary format, written by a C++ application I developed).
Task Group B
- Extract data from the database
- Perform calculations on the data and either insert or update tables in the database.
My coding experience is is primarily as a C/C++ developer, although I have been using PHP as well for the last 2 years or so. I am from a windows background so I am still finding my feet in the linux environment.
My question is this - I need to write scripts to perform the tasks I described above. Although I suppose I could write a few C++ applications to be used in the shell scripts, I think it may be better to write them in a scripting language (maybe this is a flawed assumption?). My thinking is that it would be easier to modify thins in a script - no need to rebuild etc for changes to functionality. Additionally, C++ data munging in C++ tends to involve more lines of code than "natural" scripting languages such as Perl, Python etc.
Assuming that the majority of people on here agree that scripting is the way to go, herein lies my dilema. Which scripting language to use to perform the tasks above (giving my background).
My gut instinct tells me that Perl (shudder) would be the most obvious choice for performing all of the above tasks. BUT (and that is a big BUT). The mere mention of Perl makes my toes curl, as I had a very, very bag experience with it a while back. The syntax seems quite unnatural to me - despite how many times I have tried to learn it - so if possible, I would really like to give it a miss. PHP (which I already know), also am not sure is a good candidate for scripting on the CLI (I have not seen many examples on how to do this etc - so I may be wrong).
The last thing I must mention is that IF I have to learn a new language in order to do this, I cannot afford (time constraint) to spend more than a day, in learning the key commands/features required in order to do this (I can always learn the details of the language later, once I have actually deployed the scripts).
So, which scripting language would you recommend (PHP, Python, Perl, [insert your favorite here]) - and most importantly WHY?. Or, should I just stick to writing little C++ applications that I call in a shell script?.
Lastly, if you have suggested a scripting language, can you please show with a FEW lines (Perl mongers - I'm looking in your direction [nothing to cryptic!] ;) ) how I can use the language you suggested to do what I want to do. Hopefully, the lines you present will convince me that it can be done easily and elegantly in the language you suggested.
© Stack Overflow or respective owner