lapply slower than for-loop when used for a BiomaRt query. Is that expected?
- by ptocquin
I would like to query a database using BiomaRt package. I have loci and want to retrieve some related information, let say description.
I first try to use lapply but was surprise by the time needed for the task to be performed. I thus tried a more basic for-loop and get a faster result.
Is that expected or is something wrong with my code or with my understanding of apply ? I read other posts dealing with *apply vs for-loop performance (Here, for example) and I was aware that improved performance should not be expected but I don't understand why performance here is actually lower.
Here is a reproducible example.
1) Loading the library and selecting the database :
library("biomaRt")
athaliana <- useMart("plants_mart_14")
athaliana <- useDataset("athaliana_eg_gene",mart=athaliana)
2) Querying the database :
loci <- c("at1g01300", "at1g01800", "at1g01900", "at1g02335", "at1g02790",
"at1g03220", "at1g03230", "at1g04040", "at1g04110", "at1g05240"
)
I create a function for the use in lapply :
foo <- function(loci) {
getBM("description","tair_locus",loci,athaliana)
}
When I use this function on the first element :
> system.time(foo(cwp_loci[1]))
utilisateur système écoulé
0.020 0.004 1.599
When I use lapply to retrieve the data for all values :
> system.time(lapply(loci, foo))
utilisateur système écoulé
0.220 0.000 16.376
I then created a new function, adding a for-loop :
foo2 <- function(loci) {
for (i in loci) {
getBM("description","tair_locus",loci[i],athaliana)
}
}
Here is the result :
> system.time(foo2(loci))
utilisateur système écoulé
0.204 0.004 10.919
Of course, this will be applied to a big list of loci, so the best performing option is needed. I thank you for assistance.
EDIT Following recommendation of @MartinMorgan
Simply passing the vector loci to getBM greatly improves the query efficiency. Simpler is better.
> system.time(lapply(loci, foo))
utilisateur système écoulé
0.236 0.024 110.512
> system.time(foo2(loci))
utilisateur système écoulé
0.208 0.040 116.099
> system.time(foo(loci))
utilisateur système écoulé
0.028 0.000 6.193