Read random lines from huge CSV file in Python

Posted by jbssm on Stack Overflow See other posts from Stack Overflow or by jbssm
Published on 2012-05-30T15:56:58Z Indexed on 2012/05/30 16:41 UTC
Read the original article Hit count: 235

Filed under:
|
|
|

I have this quite big CSV file (15 Gb) and I need to read about 1 million random lines from it. As far as I can see - and implement - the CSV utility in Python only allows to iterate sequentially in the file.

It's very memory consuming to read the all file into memory to use some random choosing and it's very time consuming to go trough all the file and discard some values and choose others, so, is there anyway to choose some random line from the CSV file and read only that line?

I tried without success: import csv

    with open('linear_e_LAN2A_F_0_435keV.csv') as file:
        reader = csv.reader(file)
        print reader[someRandomInteger]

A sample of the CSV file:

331.093,329.735 
251.188,249.994 
374.468,373.782 
295.643,295.159 
83.9058,0 
380.709,116.221 
352.238,351.891 
183.809,182.615 
257.277,201.302
61.4598,40.7106

© Stack Overflow or respective owner

Related posts about python

Related posts about file