Using Python to traverse a parent-child data set
- by user132748
I have a dataset of two columns in a csv file. Th purpose of this dataset is to provide a linking between two different id's if they belong to the same person. e.g (2,3,5 belong to 1)
e.g
COLA COLB 1 2 ; 1 3 ; 1 5 ; 2 6 ; 3 7 ; 9 10
In the above example 1 is linked to 2,3,5 and 2 is the linked to 6 and 3 is linked to 7.
What I am trying to achieve is to identify all records which are linked to 1 directly (2,3,5) or indirectly(6,7) and be able to say that these id's in column B belong to same person in column A and then either dedupe or add a new column to the output file which will have 1 populated for all rows that link to 1
e.g of expected output
colA colB GroupField 1 2 1; 1 3 1; 1 5 1 ;
2 6 1 ;3 7 1; 9 10 9; 10 11 9
I am a newbie and so am not sure on how to approach this problem.Appreciate any inputs you'll can provide.