Using Python to traverse a parent-child data set
Posted
by
user132748
on Programmers
See other posts from Programmers
or by user132748
Published on 2014-05-28T01:15:20Z
Indexed on
2014/05/28
3:59 UTC
Read the original article
Hit count: 272
I have a dataset of two columns in a csv file. Th purpose of this dataset is to provide a linking between two different id's if they belong to the same person. e.g (2,3,5 belong to 1) e.g
- COLA COLB 1 2 ; 1 3 ; 1 5 ; 2 6 ; 3 7 ; 9 10
In the above example 1 is linked to 2,3,5 and 2 is the linked to 6 and 3 is linked to 7. What I am trying to achieve is to identify all records which are linked to 1 directly (2,3,5) or indirectly(6,7) and be able to say that these id's in column B belong to same person in column A and then either dedupe or add a new column to the output file which will have 1 populated for all rows that link to 1
e.g of expected output
- colA colB GroupField 1 2 1; 1 3 1; 1 5 1 ; 2 6 1 ;3 7 1; 9 10 9; 10 11 9
I am a newbie and so am not sure on how to approach this problem.Appreciate any inputs you'll can provide.
© Programmers or respective owner