rpy2: Converting a data.frame to a numpy array

Posted by Mike Dewar on Stack Overflow See other posts from Stack Overflow or by Mike Dewar
Published on 2010-04-19T17:18:19Z Indexed on 2010/04/19 17:23 UTC
Read the original article Hit count: 352

Filed under:
|
|
|
|

I have a data.frame in R. It contains a lot of data : gene expression levels from many (125) arrays. I'd like the data in Python, due mostly to my incompetence in R and the fact that this was supposed to be a 30 minute job.

I would like the following code to work. To understand this code, know that the variable path contains the full path to my data set which, when loaded, gives me a variable called immgen. Know that immgen is an object (a Bioconductor ExpressionSet object) and that exprs(immgen) returns a data frame with 125 columns (experiments) and tens of thousands of rows (named genes).

robjects.r("load('%s')"%path) # loads immgen
e = robjects.r['data.frame']("exprs(immgen)")
expression_data = np.array(e)

This code runs, but expression_data is simply array([[1]]).

I'm pretty sure that e doesn't represent the data frame generated by exprs() due to things like:

In [40]: e._get_ncol()
Out[40]: 1

In [41]: e._get_nrow()
Out[41]: 1

But then again who knows? Even if e did represent my data.frame, that it doesn't convert straight to an array would be fair enough - a data frame has more in it than an array (rownames and colnames) and so maybe life shouldn't be this easy. However I still can't work out how to perform the conversion. The documentation is a bit too terse for me, though my limited understanding of the headings in the docs implies that this should be possible.

Anyone any thoughts?

© Stack Overflow or respective owner

Related posts about python

Related posts about numpy