Counting in R data.table
- by Simon Z.
I have the following data.table
set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
VAL
1: 1
2: 2
3: 2
4: 3
5: 1
6: 3
7: 3
8: 2
9: 2
10: 1
Now I want to to perform two tasks:
Count the occurrences of numbers in VAL.
Count within all rows with the same value VAL (first, second, third occurrence)
At the end I want the result
VAL COUNT IDX
1: 1 3 1
2: 2 4 1
3: 2 4 2
4: 3 3 1
5: 1 3 2
6: 3 3 2
7: 3 3 3
8: 2 4 3
9: 2 4 4
10: 1 3 3
where COUNT defines task 1. and IDX task 2.
I tried to work with which and length using .I:
dt[, list(COUNT = length(VAL == VAL[.I]),
IDX = which(which(VAL == VAL[.I]) == .I))]
but this does not work as .I refers to a vector with the index, so I guess one must use .I[]. Though inside .I[] I again face the problem, that I do not have the row index and I do know (from reading data.table FAQ and following the posts here) that looping through rows should be avoided if possible.
So, what's the data.table way?