Counting in R data.table

Posted by Simon Z. on Stack Overflow See other posts from Stack Overflow or by Simon Z.
Published on 2013-11-08T21:50:38Z Indexed on 2013/11/08 21:53 UTC
Read the original article Hit count: 222

Filed under:

data.table

I have the following data.table

set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
    VAL
 1:   1
 2:   2
 3:   2
 4:   3
 5:   1
 6:   3
 7:   3
 8:   2
 9:   2
10:   1

Now I want to to perform two tasks:

Count the occurrences of numbers in VAL.
Count within all rows with the same value VAL (first, second, third occurrence)

At the end I want the result

    VAL COUNT IDX
 1:   1     3   1
 2:   2     4   1
 3:   2     4   2
 4:   3     3   1
 5:   1     3   2
 6:   3     3   2
 7:   3     3   3
 8:   2     4   3
 9:   2     4   4
10:   1     3   3

where COUNT defines task 1. and IDX task 2.

I tried to work with which and length using .I:

 dt[, list(COUNT = length(VAL == VAL[.I]), 
             IDX = which(which(VAL == VAL[.I]) == .I))]

but this does not work as .I refers to a vector with the index, so I guess one must use .I[]. Though inside .I[] I again face the problem, that I do not have the row index and I do know (from reading data.table FAQ and following the posts here) that looping through rows should be avoided if possible.

So, what's the data.table way?

Related posts about data.table

How to obtain a random sub-datatable from another data table

as seen on Developer IT - Search for 'Developer IT'
Introduction In this article, I’ll show how to get a random subset of data from a DataTable. This is useful when you already have queries that are filtered correctly but returns all the rows. Analysis I came across this situation when I wanted to display a random tag cloud. I already had the… >>> More
Using list() to extract a data.table inside of a function

as seen on Stack Overflow - Search for 'Stack Overflow'
I must admit that the data.table J syntax confuses me. I am attempting to use list() to extract a subset of a data.table as a data.table object as described in Section 1.4 of the data.table FAQ, but I can't get this behavior to work inside of a function. An example: require(data.table) ## Setup… >>> More
replacing data.frame element-wise operations with data.table (that used rowname)

as seen on Stack Overflow - Search for 'Stack Overflow'
So lets say I have the following data.frames: df1 <- data.frame(y = 1:10, z = rnorm(10), row.names = letters[1:10]) df2 <- data.frame(y = c(rep(2, 5), rep(5, 5)), z = rnorm(10), row.names = letters[1:10]) And perhaps the "equivalent" data.tables: dt1 <- data.table(x = rownames(df1)… >>> More
Join and sum not compatible matrices through data.table

as seen on Stack Overflow - Search for 'Stack Overflow'
My goal is to "sum" two not compatible matrices (matrices with different dimensions) using (and preserving) row and column names. I've figured this approach: convert the matrices to data.table objects, join them and then sum columns vectors. An example: > M1 1 3 4 5 7 8 1 0 0 1 0 0 0 3 0 0… >>> More
C# and NpgsqlDataAdapter returning a single string instead of a data table

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a postgresql db and a C# application to access it. I'm having a strange error with values I return from a NpgsqlDataAdapter.Fill command into a DataSet. I've got this code: NpgsqlCommand n = new NpgsqlCommand(); n.Connection = connector; // a class member NpgsqlConnection DataSet ds = new… >>> More

Developer IT