reading csv files in scipy/numpy in Python

Posted by user248237 on Stack Overflow See other posts from Stack Overflow or by user248237
Published on 2010-05-18T17:05:18Z Indexed on 2010/05/18 19:50 UTC
Read the original article Hit count: 433

Filed under:

csv

|

scipy

|

numpy

|

python

|

matplotlib

I am having trouble reading a csv file, delimited by tabs, in python. I use the following function:

def csv2array(filename, skiprows=0, delimiter='\t', raw_header=False, missing=None, with_header=True):
    """
    Parse a file name into an array. Return the array and additional header lines. By default,
    parse the header lines into dictionaries, assuming the parameters are numeric,
    using 'parse_header'.
    """
    f = open(filename, 'r')
    skipped_rows = []
    for n in range(skiprows):
        header_line = f.readline().strip()
        if raw_header:
            skipped_rows.append(header_line)
        else:
            skipped_rows.append(parse_header(header_line))
    f.close()
    if missing:
        data = genfromtxt(filename, dtype=None, names=with_header,
                          deletechars='', skiprows=skiprows, missing=missing)
    else:
    if delimiter != '\t':
        data = genfromtxt(filename, dtype=None, names=with_header, delimiter=delimiter,
                  deletechars='', skiprows=skiprows)
    else:
        data = genfromtxt(filename, dtype=None, names=with_header,
                  deletechars='', skiprows=skiprows)        
    if data.ndim == 0:
    data = array([data.item()])
    return (data, skipped_rows)

the problem is that genfromtxt complains about my files, e.g. with the error:

Line #27100 (got 12 columns instead of 16)

I am not sure where these errors come from. Any ideas?

Here's an example file that causes the problem:

#Gene   120-1   120-3   120-4   30-1    30-3    30-4    C-1 C-2 C-5 genesymbol  genedesc
ENSMUSG00000000001  7.32    9.5 7.76    7.24    11.35   8.83    6.67    11.35   7.12    Gnai3   guanine nucleotide binding protein alpha
ENSMUSG00000000003  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Pbsn    probasin

Is there a better way to write a generic csv2array function? thanks.

Developer IT

reading csv files in scipy/numpy in Python - Developer IT

reading csv files in scipy/numpy in Python

csv

scipy

numpy

python

matplotlib

Related posts about csv

DBD::CSV: How can I generate different behavior with the two f_ext-options ".csv" and ".csv/r"?

SUPER CSV write bean to CSV.

export to csv using fastercsv and CSV::Writer (Ruby on Rails)

How can i make changes to this file Encoding?

Unix sort keys cause performance problems

Related posts about scipy

Cocoa App with Python extension which use Scipy -> ImportError: No module named scipy

Mac 10.6 Universal Binary scipy: cephes/specfun "_aswfa_" symbol not found

Calculate Matrix Rank using scipy

Vectorizatoin of index operation for a scipy.sparse matrix

Python to MATLAB: exporting list of strings using scipy.io

Categories cloud