Problem background
Imagine this problem. You have a water molecule which is in contact with other molecules (if the contact is a hydrogen bond, there can be 4 other molecules around my water). Like in the following picture (A, B, C, D are some other atoms and dots mean the contact).
A B
. .
O
/ \
H H
. .
C D
I have the information about all the dots and I need to eliminate the water in the center and create records describing contacts of A-C, A-D, A-B, B-C, B-D, and C-D.
Database structure
Currently, I have the following structure in the database:
Table atoms:
"id" integer PRIMARY KEY,
"amino" char(3) NOT NULL, (HOH for water or other value)
other columns identifying the atom
Table contacts:
"acceptor_id" integer NOT NULL, (the atom near to my hydrogen, here C or D)
"donor_id" integer NOT NULL, (here A or B)
"directness" char(1) NOT NULL, (this should be D for direct and W for water-mediated)
other columns about the contact, such as the distance
Current solution (insufficient)
Now, I'm going through all the contacts which have donor.amino = "HOH". In this sample case, this would select contacts from C and D. For each of these selected contacts, I look up contacts having the same acceptor_id as is the donor_id in the currently selected contact. From this information, I create the new contact. At the end, I delete all contacts to or from HOH.
This way, I am obviously unable to create C-D and A-B contacts (the other 4 are OK).
If I try a similar approach - trying to find two contacts having the same donor_id, I end up with duplicate contacts (C-D and D-C).
Is there a simple way to retrieve all six contacts without duplicates?
I'm dreaming about some one page long SQL query which retrievs just these six wanted rows. :-)
It is preferable to conserve information about who is donor where possible, but not strictly necessary.
Big thanks to all of you who read this question to this point.