Setup
I have an entity-component architecture where Entities can have a set of attributes (which are pure data with no behavior) and there exist systems that run the entity logic which act on that data. Essentially, in somewhat pseudo-code:
Entity
{
id;
map<id_type, Attribute> attributes;
}
System
{
update();
vector<Entity> entities;
}
A system that just moves along all entities at a constant rate might be
MovementSystem extends System
{
update()
{
for each entity in entities
position = entity.attributes["position"];
position += vec3(1,1,1);
}
}
Essentially, I'm trying to parallelise update() as efficiently as possible. This can be done by running entire systems in parallel, or by giving each update() of one system a couple of components so different threads can execute the update of the same system, but for a different subset of entities registered with that system.
Problem
In reality, these systems sometimes require that entities interact(/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimes between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input).
Now, when trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races.
Otherwise the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts of the update process that depend on other parts. This seems ugly. To me it would seem more elegant to be able to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modifications to that data back until some later point.
The fact that this is even possible is based on the assumption that modification write-backs are usually very small in complexity, and don't require much performance, whereas computations are very expensive (relatively). So the overhead added by a delayed-write phase might be evened out by more efficient updating of entities (by having threads work more % of the time instead of waiting).
A concrete example of this might be a system that updates physics. The system needs to both read and write a lot of data to and from entities.
Optimally, there would be a system in place where all available threads update a subset of all entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaround, we would have to find other systems to run in parallel (which don't modify the same data as the physics system), other wise the remaining threads are waiting and wasting time.
However, that has disadvantages
Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data.
Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems.
Solution?
In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, and then in a separate, performance-wise cheap, write phase, attributes of entities that needed to be modified are finally written back to the entities.
The Question
How might such a system be implemented to achieve optimal performance, as well as making programmer life easier? What are the implementation details of such a system and what might have to be changed in the existing EC-architecture to accommodate this solution?