Collapsing data frame by selecing one row per group

Posted by jkebinger on Stack Overflow See other posts from Stack Overflow or by jkebinger
Published on 2010-04-13T02:17:18Z Indexed on 2010/04/13 2:22 UTC
Read the original article Hit count: 305

Filed under:

I'm trying to collapse a data frame by removing all but one row from each group of rows with identical values in a particular column. In other words, the first row from each group.

For example, I'd like to convert this

> d = data.frame(x=c(1,1,2,4),y=c(10,11,12,13),z=c(20,19,18,17))
> d
  x  y  z
1 1 10 20
2 1 11 19
3 2 12 18
4 4 13 17

Into this:

    x  y  z
1   1 11 19
2   2 12 18
3   4 13 17

I'm using aggregate to do this currently, but the performance is unacceptable with more data:

> d.ordered = d[order(-d$y),]
> aggregate(d.ordered,by=list(key=d.ordered$x),FUN=function(x){x[1]})

I've tried split/unsplit with the same function argument as here, but unsplit complains about duplicate row numbers.

Is rle a possibility? Is there an R idiom to convert rle's length vector into the indices of the rows that start each run, which I can then use to pluck those rows out of the data frame?

© Stack Overflow or respective owner

Related posts about r