Unix sort keys cause performance problems
- by KenFar
My data:
It's a 71 MB file with 1.5 million rows.
It has 6 fields
All six fields combine to form a unique key - so that's what I need to sort on.
Sort statement:
sort -t ',' -k1,1 -k2,2 -k3,3 -k4,4 -k5,5 -k6,6 -o output.csv input.csv
The problem:
If I sort without keys, it takes 30 seconds.
If I sort with keys, it takes 660 seconds.
I…