How do I replace values within a data frame with a string in R?
- by Arturito
short version: How do I replace values within a data frame with a string found within another data frame?
longer version: I'm a biologist working with many species of bees. I have a data set with many thousands of bees. Each row has a unique bee ID # along with all the relevant info about that specimen (data of capture, GPS location, etc). The species information for each bee has not been entered because it takes a long time to ID them. When IDing, I end up with boxes of hundred of bees, all of the same species. I enter these into a separate data frame. I am trying to write code that will update the original data file with species information (family, genus, species, sex, etc) as I ID the bees. Currently, in the original data file, the species info is blank and is interpreted as NA within R. I want to have R find all unique bee ID #'s and fill in the species info, but I am having trouble figuring out how to replace the NA values with a string (e.g. "Andrenidae")
Here is a simple example of what I am trying to do:
rawData<-data.frame(beeID=c(1:20),family=rep(NA,20))
speciesInfo<-data.frame(beeID=seq(1,20,3),family=rep("Andrenidae",7))
rawData[rawData$beeID == 4,"family"] <- speciesInfo[speciesInfo$beeID == 4,"family"]
So, I am replacing things as I want, but with a number rather than the family name (a string). What I would eventually like to do is write a little loop to add in all the species info, e.g.:
for (i in speciesInfo$beeID){
rawData[rawData$beeID == i,"family"] <- speciesInfo[speciesInfo$beeID == i,"family"]
}
Thanks in advance for any advice!
Cheers,
Zak
EDIT:
I just noticed that the first two methods below add a new column each time, which would cause problems if I needed to add species info multiple times (which I typically do). For example:
rawData<-data.frame(beeID=c(1:20),family=rep(NA,20))
Andrenidae<-data.frame(beeID=seq(1,20,3),family=rep("Andrenidae",7))
Halictidae<-data.frame(beeID=seq(1,20,3)+1,family=rep("Halictidae",7))
# using join
library(plyr)
rawData <- join(rawData, Andrenidae, by = "beeID", type = "left")
rawData <- join(rawData, Halictidae, by = "beeID", type = "left")
# using merge
rawData <- merge(x=rawData,y=Andrenidae,by='beeID',all.x=T,all.y=F)
rawData <- merge(x=rawData,y=Halictidae,by='beeID',all.x=T,all.y=F)
Is there a way to either collapse the columns so that I have one, unified data frame? Or a way to update the rawData rather than adding a new column each time? Thanks in advance!