Simple, fast SQL queries for flat files.
- by plinehan
Does anyone know of any tools to provide simple, fast queries of flat files using a SQL-like declarative query language? I'd rather not pay the overhead of loading the file into a DB since the input data is typically thrown out almost immediately after the query is run.
Consider the data file, "animals.txt":
dog 15
cat 20
dog 10
cat 30
dog 5
cat 40
Suppose I want to extract the highest value for each unique animal. I would like to write something like:
cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"
I can get nearly the same result using sort:
cat animals.txt | sort -t " " -k1,1 -k2,2nr
And I can always drop into awk from there, but this all feels a bit awkward (couldn't resist) when a SQL-like language would seem to solve the problem so cleanly.
I've considered writing a wrapper for SQLite that would automatically create a table based on the input data, and I've looked into using Hive in single-processor mode, but I can't help but feel this problem has been solved before. Am I missing something? Is this functionality already implemented by another standard tool?
Halp!