What's the standard algorithm for syncing two lists of objects?
- by Oliver Giesen
I'm pretty sure this must be in some kind of text book (or more likely in all of them) but I seem to be using the wrong keywords to search for it... :(
A common task I'm facing while programming is that I am dealing with lists of objects from different sources which I need to keep in sync somehow. Typically there's some sort of "master list" e.g. returned by some external API and then a list of objects I create myself each of which corresponds to an object in the master list. Sometimes the nature of the external API will not allow me to do a live sync: For instance the external list might not implement notifications about items being added or removed or it might notify me but not give me a reference to the actual item that was added or removed. Furthermore, refreshing the external list might return a completely new set of instances even though they still represent the same information so simply storing references to the external objects might also not always be feasible. Another characteristic of the problem is that both lists cannot be sorted in any meaningful way. You should also assume that initializing new objects in the "slave list" is expensive, i.e. simply clearing and rebuilding it from scratch is not an option.
So how would I typically tackle this? What's the name of the algorithm I should google for?
In the past I have implemented this in various ways (see below for an example) but it always felt like there should be a cleaner and more efficient way.
Here's an example approach:
Iterate over the master list
Look up each item in the "slave list"
Add items that do not yet exist
Somehow keep track of items that already exist in both lists (e.g. by tagging them or keeping yet another list)
When done iterate once more over the slave list
Remove all objects that have not been tagged (see 4.)
Update
Thanks for all your responses so far! I will need some time to look at the links.
Maybe one more thing worthy of note: In many of the situations where I needed this the implementation of the "master list" is completely hidden from me. In the most extreme cases the only access I might have to the master list might be a COM-interface that exposes nothing but GetFirst-, GetNext-style methods. I'm mentioning this because of the suggestions to either sort the list or to subclass it both of which is unfortunately not practical in these cases unless I copy the elements into a list of my own and I don't think that would be very efficient.
I also might not have made it clear enough that the elements in the two lists are of different types, i.e. not assignment-compatible: Especially, the elements in the master list might be available as interface references only.