lapply slower than for-loop when used for a BiomaRt query. Is that expected?
Posted
by
ptocquin
on Stack Overflow
See other posts from Stack Overflow
or by ptocquin
Published on 2012-09-05T19:49:47Z
Indexed on
2012/09/06
21:39 UTC
Read the original article
Hit count: 205
I would like to query a database using BiomaRt
package. I have loci
and want to retrieve some related information, let say description
.
I first try to use lapply
but was surprise by the time needed for the task to be performed. I thus tried a more basic for-loop
and get a faster result.
Is that expected or is something wrong with my code or with my understanding of apply
? I read other posts dealing with *apply
vs for-loop
performance (Here, for example) and I was aware that improved performance should not be expected but I don't understand why performance here is actually lower.
Here is a reproducible example.
1) Loading the library and selecting the database :
library("biomaRt")
athaliana <- useMart("plants_mart_14")
athaliana <- useDataset("athaliana_eg_gene",mart=athaliana)
2) Querying the database :
loci <- c("at1g01300", "at1g01800", "at1g01900", "at1g02335", "at1g02790",
"at1g03220", "at1g03230", "at1g04040", "at1g04110", "at1g05240"
)
I create a function for the use in lapply
:
foo <- function(loci) {
getBM("description","tair_locus",loci,athaliana)
}
When I use this function on the first element :
> system.time(foo(cwp_loci[1]))
utilisateur système écoulé
0.020 0.004 1.599
When I use lapply
to retrieve the data for all values :
> system.time(lapply(loci, foo))
utilisateur système écoulé
0.220 0.000 16.376
I then created a new function, adding a for-loop
:
foo2 <- function(loci) {
for (i in loci) {
getBM("description","tair_locus",loci[i],athaliana)
}
}
Here is the result :
> system.time(foo2(loci))
utilisateur système écoulé
0.204 0.004 10.919
Of course, this will be applied to a big list of loci
, so the best performing option is needed. I thank you for assistance.
EDIT Following recommendation of @MartinMorgan
Simply passing the vector loci
to getBM greatly improves the query efficiency. Simpler is better.
> system.time(lapply(loci, foo))
utilisateur système écoulé
0.236 0.024 110.512
> system.time(foo2(loci))
utilisateur système écoulé
0.208 0.040 116.099
> system.time(foo(loci))
utilisateur système écoulé
0.028 0.000 6.193
© Stack Overflow or respective owner