How do we keep dependent data structures up to date?
- by Geo
Suppose you have a parse tree, an abstract syntax tree, and a control flow graph, each one logically derived from the one before. In principle it is easy to construct each graph given the parse tree, but how can we manage the complexity of updating the graphs when the parse tree is modified? We know exactly how the tree has been modified, but how can the change be propagated to the other trees in a way that doesn't become difficult to manage?
Naturally the dependent graph can be updated by simply reconstructing it from scratch every time the first graph changes, but then there would be no way of knowing the details of the changes in the dependent graph.
I currently have four ways to attempt to solve this problem, but each one has difficulties.
Nodes of the dependent tree each observe the relevant nodes of the original tree, updating themselves and the observer lists of original tree nodes as necessary. The conceptual complexity of this can become daunting.
Each node of the original tree has a list of the dependent tree nodes that specifically depend upon it, and when the node changes it sets a flag on the dependent nodes to mark them as dirty, including the parents of the dependent nodes all the way down to the root. After each change we run an algorithm that is much like the algorithm for constructing the dependent graph from scratch, but it skips over any clean node and reconstructs each dirty node, keeping track of whether the reconstructed node is actually different from the dirty node. This can also get tricky.
We can represent the logical connection between the original graph and the dependent graph as a data structure, like a list of constraints, perhaps designed using a declarative language. When the original graph changes we need only scan the list to discover which constraints are violated and how the dependent tree needs to change to correct the violation, all encoded as data.
We can reconstruct the dependent graph from scratch as though there were no existing dependent graph, and then compare the existing graph and the new graph to discover how it has changed. I'm sure this is the easiest way because I know there are algorithms available for detecting differences, but they are all quite computationally expensive and in principle it seems unnecessary so I'm deliberately avoiding this option.
What is the right way to deal with these sorts of problems? Surely there must be a design pattern that makes this whole thing almost easy. It would be nice to have a good solution for every problem of this general description. Does this class of problem have a name?