How do I replace values within a data frame with a string in R?

Posted by Arturito on Stack Overflow See other posts from Stack Overflow or by Arturito
Published on 2012-09-11T13:01:03Z Indexed on 2012/09/12 3:38 UTC
Read the original article Hit count: 261

Filed under:
|
|
|

short version: How do I replace values within a data frame with a string found within another data frame?

longer version: I'm a biologist working with many species of bees. I have a data set with many thousands of bees. Each row has a unique bee ID # along with all the relevant info about that specimen (data of capture, GPS location, etc). The species information for each bee has not been entered because it takes a long time to ID them. When IDing, I end up with boxes of hundred of bees, all of the same species. I enter these into a separate data frame. I am trying to write code that will update the original data file with species information (family, genus, species, sex, etc) as I ID the bees. Currently, in the original data file, the species info is blank and is interpreted as NA within R. I want to have R find all unique bee ID #'s and fill in the species info, but I am having trouble figuring out how to replace the NA values with a string (e.g. "Andrenidae")

Here is a simple example of what I am trying to do:

rawData<-data.frame(beeID=c(1:20),family=rep(NA,20))
speciesInfo<-data.frame(beeID=seq(1,20,3),family=rep("Andrenidae",7))

rawData[rawData$beeID == 4,"family"]  <- speciesInfo[speciesInfo$beeID == 4,"family"]

So, I am replacing things as I want, but with a number rather than the family name (a string). What I would eventually like to do is write a little loop to add in all the species info, e.g.:

for (i in speciesInfo$beeID){
  rawData[rawData$beeID == i,"family"]  <- speciesInfo[speciesInfo$beeID == i,"family"]
}

Thanks in advance for any advice!

Cheers,

Zak

EDIT:

I just noticed that the first two methods below add a new column each time, which would cause problems if I needed to add species info multiple times (which I typically do). For example:

rawData<-data.frame(beeID=c(1:20),family=rep(NA,20))
Andrenidae<-data.frame(beeID=seq(1,20,3),family=rep("Andrenidae",7))
Halictidae<-data.frame(beeID=seq(1,20,3)+1,family=rep("Halictidae",7))

# using join
library(plyr)
rawData <- join(rawData, Andrenidae, by = "beeID", type = "left")
rawData <- join(rawData, Halictidae, by = "beeID", type = "left")

# using merge
rawData <- merge(x=rawData,y=Andrenidae,by='beeID',all.x=T,all.y=F)
rawData <- merge(x=rawData,y=Halictidae,by='beeID',all.x=T,all.y=F)

Is there a way to either collapse the columns so that I have one, unified data frame? Or a way to update the rawData rather than adding a new column each time? Thanks in advance!

© Stack Overflow or respective owner

Related posts about string

Related posts about r