Consider train timetable data, where each service (or "run") has a data structure as such:
public class TimeTable
{
public int Id {get;set;}
public List<Run> Runs {get;set;}
}
public class Run
{
public List<Stop> Stops {get;set;}
public int RunId {get;set;}
}
public class Stop
{
public int StationId {get;set;}
public TimeSpan? StopTime {get;set;}
public bool IsStop {get;set;}
}
We have a list of runs that operate against a particular line (the TimeTable class).
Further, whilst we have a set collection of stations that are on a line, not all runs stop at all stations (that is, IsStop would be false, and StopTime would be null).
Now, imagine that we have received the initial timetable, processed it, and loaded it into the above data structure. Once the initial load is complete, it is persisted into a database - the data structure is used only to load the timetable from its source and to persist it to the database.
We are now receiving an updated timetable.
The updated timetable may or may not have any changes to it - we don't know and are not told whether any changes are present.
What I would like to do is perform a compare for each run in an efficient manner. I don't want to simply replace each run. Instead, I want to have a background task that runs periodically that downloads the updated timetable dataset, and then compares it to the current timetable. If differences are found, some action (not relevant to the question) will take place.
I was initially thinking of some sort of checksum process, where I could, for example, load both runs (that is, the one from the new timetable received and the one that has been persisted to the database) into the data structure and then add up all the hour components of the StopTime, and all the minute components of the StopTime and compare the results (i.e. both the sum of Hours and sum of Minutes would be the same, and differences introduced if a stop time is changed, a stop deleted or a new stop added).
Would that be a valid way to check for differences, or is there a better way to approach this problem? I can see a problem that, for example, one stop is changed to be 2 minutes earlier, and another changed to be 2 minutes later would have a net zero change.
Or am I over thinking this, and would it just be simpler to brute check all stops to ensure that
The updated run stops at the same stations; and
Each stop is at the same time