Create unique identifier for different row-groups
Posted
by
Max van der Heijden
on Stack Overflow
See other posts from Stack Overflow
or by Max van der Heijden
Published on 2012-11-05T10:58:23Z
Indexed on
2012/11/05
11:00 UTC
Read the original article
Hit count: 201
I want to number certain combinations of row in a dataframe (which is ordered on ID and on Time)
tc <- textConnection('
id time end_yn number
abc 10 0 1
abc 11 0 2
abc 12 1 3
abc 13 0 1
def 10 0 1
def 15 1 2
def 16 0 1
def 17 0 2
def 18 1 3
')
test <- read.table(tc, header=TRUE)
The goal is to create a new column ("journey_nr
") that give a unique number to each row based on the journey it belongs to. Journeys are defined as a sequence of rows per id
up until to end_yn == 1
, also if end_yn
never becomes 1, the journey should also be numbered (see the expected outcome example). It is only possible to have end_yn == 0
journeys at the end of a collection of rows for an ID (as shown at row 4 for id 3). So either no end_yn == 1
has occured for that ID or that happened before the end_yn == 0
-journey (see id == abc
in the example).
I know how to number using the data.table
package, but I do not know which columns to combine in order to get the expected outcome. I've searched the data.table
-tag on SO, but could not find a similar problem.
Expected outcome:
id time end_yn number journey
abc 10 0 1 1
abc 11 0 2 1
abc 12 1 3 1
abc 13 0 1 2
def 10 0 1 3
def 15 1 2 3
def 16 0 1 4
def 17 0 2 4
def 18 1 3 4
© Stack Overflow or respective owner