A graph-based tuple merge?
- by user1644030
I have paired values in tuples that are related matches (and technically still in CSV files). Neither of the paired values are necessarily unique.
tupleAB = (A####, B###), (A###, B###), (A###, B###)...
tupleBC = (B####, C###), (B###, C###), (B###, C###)...
tupleAC = (A####, C###), (A###, C###), (A###, C###)...
My ideal output would be a dictionary with a unique ID and a list of "reinforced" matches. The way I try to think about it is in a graph-based context.
For example, if:
tupleAB[x] = (A0001, B0012)
tupleBC[y] = (B0012, C0230)
tupleAC[z] = (A0001, C0230)
This would produce:
output = {uniquekey0001, [A0001, B0012, C0230]}
Ideally, this would also be able to scale up to more than three tuples (for example, adding a "D" match that would result in an additional three tuples - AD, BD, and CD - and lists of four items long; and so forth).
In regards to scaling up to more tuples, I am open to having "graphs" that aren't necessarily fully connected, i.e., every node connected to every other node. My hunch is that I could easily filter based on the list lengths. I am open to any suggestions.
I think, with a few cups of coffee, I could work out a brute force solution, but I thought I'd ask the community if anyone was aware of a more elegant solution. Thanks for any feedback.