A graph-based tuple merge?

Posted by user1644030 on Stack Overflow See other posts from Stack Overflow or by user1644030
Published on 2012-09-05T21:36:17Z Indexed on 2012/09/05 21:37 UTC
Read the original article Hit count: 280

I have paired values in tuples that are related matches (and technically still in CSV files). Neither of the paired values are necessarily unique.

tupleAB = (A####, B###), (A###, B###), (A###, B###)...
tupleBC = (B####, C###), (B###, C###), (B###, C###)...
tupleAC = (A####, C###), (A###, C###), (A###, C###)...

My ideal output would be a dictionary with a unique ID and a list of "reinforced" matches. The way I try to think about it is in a graph-based context.

For example, if:

tupleAB[x] = (A0001, B0012)
tupleBC[y] = (B0012, C0230)
tupleAC[z] = (A0001, C0230)

This would produce:

output = {uniquekey0001, [A0001, B0012, C0230]}

Ideally, this would also be able to scale up to more than three tuples (for example, adding a "D" match that would result in an additional three tuples - AD, BD, and CD - and lists of four items long; and so forth).

In regards to scaling up to more tuples, I am open to having "graphs" that aren't necessarily fully connected, i.e., every node connected to every other node. My hunch is that I could easily filter based on the list lengths. I am open to any suggestions.

I think, with a few cups of coffee, I could work out a brute force solution, but I thought I'd ask the community if anyone was aware of a more elegant solution. Thanks for any feedback.

© Stack Overflow or respective owner

Related posts about python

Related posts about algorithm