Select only the first rows for each unique value of a column in R

Posted by dmvianna on Stack Overflow See other posts from Stack Overflow or by dmvianna
Published on 2012-11-07T22:45:17Z Indexed on 2012/11/07 23:00 UTC
Read the original article Hit count: 314

Filed under:

r

|

data.frame

|

sqldf

From a dataframe like this

test <- data.frame('id'= rep(1:5,2), 'string'= LETTERS[1:10])
test <- test[order(test$id), ]
rownames(test) <- 1:10

> test
    id string
 1   1      A
 2   1      F
 3   2      B
 4   2      G
 5   3      C
 6   3      H
 7   4      D
 8   4      I
 9   5      E
 10  5      J

I want to create a new one with the first appearance of each id / string pair. If sqldf accepted R code within it, the query could look like this:

res <- sqldf("select id, min(rownames(test)), string 
              from test 
              group by id, string")

> res
    id string
 1   1      A
 3   2      B
 5   3      C
 7   4      D
 9   5      E

Is there a solution short of creating a new column like

test$row <- rownames(test)

and running the same sqldf query with min(row)?

Related posts about data.frame

Subsetting a data frame in a function using another data frame as parameter

as seen on Stack Overflow - Search for 'Stack Overflow'
I would like to submit a data frame to a function and use it to subset another data frame. This is the basic data frame: foo <- data.frame(var1= c('1', '1', '1', '2', '2', '3'), var2=c('A', 'A', 'B', 'B', 'C', 'C')) I use the following function to find out the frequencies of var2 for specified… >>> More
Substitute values (for specific dates) from a second data frame to the first data frame

as seen on Stack Overflow - Search for 'Stack Overflow'
I have two time series data frames: The first one: head(df1) : GMT MSCI ACWI DJGlbl Russell 1000 Russell Dev S&P GSCI Industrial S&P GSCI Precious 1999-03-01 -0.7000000 0.2000000 -0.1000000 -1.5000000 -1.0000000 -0.4000000 1999-03-02 -0.5035247 0.0998004… >>> More
Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution

as seen on Oracle Blogs - Search for 'Oracle Blogs'
The plyr package contains a set of tools for partitioning a problem into smaller sub-problems that can be more easily processed. One function within {plyr} is ddply, which allows you to specify subsets of a data.frame and then apply a function to each subset. The result is gathered into a single data… >>> More
Subset a data.frame by list and apply function on each part, by rows

as seen on Stack Overflow - Search for 'Stack Overflow'
This may seem as a typical plyr problem, but I have something different in mind. Here's the function that I want to optimize (skip the for loop). # dummy data set.seed(1985) lst <- list(a=1:10, b=11:15, c=16:20) m <- matrix(round(runif(200, 1, 7)), 10) m <- as.data.frame(m) dfsub <-… >>> More
Appending column to a data frame - R

as seen on Stack Overflow - Search for 'Stack Overflow'
Is it possible to append a column to data frame in the following scenario? dfWithData <- data.frame(start=c(1,2,3), end=c(11,22,33)) dfBlank <- data.frame() ..how to append column start from dfWithData to dfBlank? It looks like the data should be added when data frame is being initialized… >>> More

Developer IT

Select only the first rows for each unique value of a column in R - Developer IT

Select only the first rows for each unique value of a column in R

r

data.frame

sqldf

Related posts about r

Related posts about data.frame

Subsetting a data frame in a function using another data frame as parameter

Substitute values (for specific dates) from a second data frame to the first data frame

Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution

Subset a data.frame by list and apply function on each part, by rows

Appending column to a data frame - R

Categories cloud