compare two characters based on subset

Posted by schultem on Stack Overflow See other posts from Stack Overflow or by schultem
Published on 2012-09-04T21:21:55Z Indexed on 2012/09/04 21:38 UTC
Read the original article Hit count: 198

Filed under:

plyr

I have a simple dataframe with two columns:

df <- data.frame(x = c(1,1,2,2,3), 
                 y = c(rep(1:2,2),1), 
                 target = c('a','a','a','b','a'))

I would like to compare the strings in the target column (find out whether they are equal or not, i.e., TRUE or FALSE) within every level of x (same number for x). First I would like to compare lines 1 and 2, then 3 and 4 ... My problem is that I am missing some comparisons, for example, line 5 has only one case instead of two - so it should turn out to be FALSE. Variable y indicates the first and second case within x.

I played around with ddply doing something like:

ddply(df, .(x), summarise,
        ifelse(as.character(df[df$y == '1',]$target), 
               as.character(df[df$y == '2',]$target),0,1))

which is ugly ... and does not work ...

Any insights how I could achieve this comparison?

Thanks

Related posts about plyr

Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution

as seen on Oracle Blogs - Search for 'Oracle Blogs'
The plyr package contains a set of tools for partitioning a problem into smaller sub-problems that can be more easily processed. One function within {plyr} is ddply, which allows you to specify subsets of a data.frame and then apply a function to each subset. The result is gathered into a single data… >>> More
Calculating a Sample Covariance Matrix for Groups with plyr

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm going to use the sample code from http://gettinggeneticsdone.blogspot.com/2009/11/split-apply-and-combine-in-r-using-plyr.html for this example. So, first, let's copy their example data: mydata=data.frame(X1=rnorm(30), X2=rnorm(30,5,2), SNP1=c(rep("AA",10), rep("Aa",10), rep("aa",10)), SNP2=c(rep("BB"… >>> More
How to better create stacked bar graphs with multiple variables from ggplot2?

as seen on Stack Overflow - Search for 'Stack Overflow'
I often have to make stacked barplots to compare variables, and because I do all my stats in R, I prefer to do all my graphics in R with ggplot2. I would like to learn how to do two things: First, I would like to be able to add proper percentage tick marks for each variable rather than tick marks… >>> More
subtotals in columns usind reshape2 in R

as seen on Stack Overflow - Search for 'Stack Overflow'
I have spent some time now learning RESHAPE2 and plyr but I still do not get it. This time I have a problem with (a) subtotals and (b) passing different aggregate functions . Here an example using data from the excellent tutorial on the blog of mrdwab http://news.mrdwab.com/ # libraries library(plyr) … >>> More
break dataframe into subsets by factor values, send to function that returns glm class, how to recom

as seen on Stack Overflow - Search for 'Stack Overflow'
Thanks to Hadley's plyr package ddply function we can take a dataframe, break it down into subdataframes by factors, send each to a function, and then combine the function results for each subdataframe into a new dataframe. But what if the function returns an object of a class like glm or in my case… >>> More

Developer IT