Create unique identifier for different row-groups
- by Max van der Heijden
I want to number certain combinations of row in a dataframe (which is ordered on ID and on Time)
tc <- textConnection('
id time end_yn number
abc 10 0 1
abc 11 0 2
abc 12 1 3
abc 13 0 1
def 10 0 1
def 15 1 2
def 16 0 1
def 17 0 2
def 18 1 3
')
test <- read.table(tc, header=TRUE)
The goal is to create a new column ("journey_nr") that give a unique number to each row based on the journey it belongs to. Journeys are defined as a sequence of rows per id up until to end_yn == 1, also if end_ynnever becomes 1, the journey should also be numbered (see the expected outcome example). It is only possible to have end_yn == 0 journeys at the end of a collection of rows for an ID (as shown at row 4 for id 3). So either no end_yn == 1 has occured for that ID or that happened before the end_yn == 0-journey (see id == abc in the example).
I know how to number using the data.table package, but I do not know which columns to combine in order to get the expected outcome. I've searched the data.table-tag on SO, but could not find a similar problem.
Expected outcome:
id time end_yn number journey
abc 10 0 1 1
abc 11 0 2 1
abc 12 1 3 1
abc 13 0 1 2
def 10 0 1 3
def 15 1 2 3
def 16 0 1 4
def 17 0 2 4
def 18 1 3 4