Search Results

Search found 3 results on 1 pages for 'bioconductor'.

Page 1/1 | 1 

  • R + Bioconductor : combining probesets in an ExpressionSet

    - by Mike Dewar
    Hi, First off, this may be the wrong Forum for this question, as it's pretty darn R+Bioconductor specific. Here's what I have: library('GEOquery') GDS = getGEO('GDS785') cd4T = GDS2eSet(GDS) cd4T <- cd4T[!fData(cd4T)$symbol == "",] Now cd4T is an ExpressionSet object which wraps a big matrix with 19794 rows (probesets) and 15 columns (samples). The final line gets rid of all probesets that do not have corresponding gene symbols. Now the trouble is that most genes in this set are assigned to more than one probeset. You can see this by doing gene_symbols = factor(fData(cd4T)$Gene.symbol) length(gene_symbols)-length(levels(gene_symbols)) [1] 6897 So only 6897 of my 19794 probesets have unique probeset - gene mappings. I'd like to somehow combine the expression levels of each probeset associated with each gene. I don't care much about the actual probe id for each probe. I'd like very much to end up with an ExpressionSet containing the merged information as all of my downstream analysis is designed to work with this class. I think I can write some code that will do this by hand, and make a new expression set from scratch. However, I'm assuming this can't be a new problem and that code exists to do it, using a statistically sound method to combine the gene expression levels. I'm guessing there's a proper name for this also but my googles aren't showing up much of use. Can anyone help?

    Read the article

  • rpy2: Converting a data.frame to a numpy array

    - by Mike Dewar
    I have a data.frame in R. It contains a lot of data : gene expression levels from many (125) arrays. I'd like the data in Python, due mostly to my incompetence in R and the fact that this was supposed to be a 30 minute job. I would like the following code to work. To understand this code, know that the variable path contains the full path to my data set which, when loaded, gives me a variable called immgen. Know that immgen is an object (a Bioconductor ExpressionSet object) and that exprs(immgen) returns a data frame with 125 columns (experiments) and tens of thousands of rows (named genes). robjects.r("load('%s')"%path) # loads immgen e = robjects.r['data.frame']("exprs(immgen)") expression_data = np.array(e) This code runs, but expression_data is simply array([[1]]). I'm pretty sure that e doesn't represent the data frame generated by exprs() due to things like: In [40]: e._get_ncol() Out[40]: 1 In [41]: e._get_nrow() Out[41]: 1 But then again who knows? Even if e did represent my data.frame, that it doesn't convert straight to an array would be fair enough - a data frame has more in it than an array (rownames and colnames) and so maybe life shouldn't be this easy. However I still can't work out how to perform the conversion. The documentation is a bit too terse for me, though my limited understanding of the headings in the docs implies that this should be possible. Anyone any thoughts?

    Read the article

  • R: getting "inside" environments

    - by Mike Dewar
    Given an environment object e: > e <environment: 0x10f0a6e98> > class(e) [1] "environment" How do you access the variables inside the environment? Just in case you're curious, I have found myself with this environment object. I didn't make it, a package in Bioconductor made it. You can make it, too, using these commands: library('GEOquery') eset <- getGEO("GSE4142")[[1]] e <- assayData(eset)

    Read the article

1