In realitythe case of the shown MovementSystem, parallelization is trivial. Since entities don't depend on each other, and don't modify shared data, we could just move all entities in parallel.
However, these systems sometimes require that entities interact with (/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimesoften between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input).
Now, when trying to parallelize the update phases of entity/component systemsFor example, the phases in which data (components/attributes) from Entitiesa physics system sometimes entities may interact with each other. Two objects collide, their positions, velocities and other attributes are read and used to compute somethingfrom them, are updated, and the phase wherethen the modified data isupdated attributes are written back to both entities need to be separated in order to avoid data races.
OtherwiseAnd before the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts ofrendering system in the update process that depend on other parts. This seems ugly. To meengine can start rendering entities, it would seem more eleganthas to be ablewait for other systems to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modificationscomplete execution to that data back until some later point.
The fact that this is even possible is based on the assumptionensure that modification write-backs are usually very small in complexity, and don't require much performance, whereas computationsall relevant attributes are very expensive (relatively). So the overhead added by a delayed-write phase mightwhat they need to be evened out by more efficient updating of entities (by having threads work more % of the time instead of waiting).
A concrete example ofIf we try to blindly parallelize this might be a system that updates physics. The system needs, it will lead to bothclassical race conditions where different systems may read and write a lot ofmodify data to and from entitiesat the same time.
OptimallyIdeally, there would beexist a system in placesolution where all available threads update a subset of allsystems may read data from any entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaroundit wishes to, we would havewithout having to findworry about other systems to run in parallel (which don't modify themodifying that same data asat the physics system)same time, other wiseand without having the remaining threads are waitingprogrammer care about properly ordering the execution and wasting timeparallelization of these systems manually (which may sometimes not even be possible).
HoweverIn a basic implementation, that has disadvantagesthis could be achieved by just putting all data reads and writes in critical sections (guarding them with mutexes). But this induces a large amount of runtime overhead and is probably not suitable for performance sensitive applications.
- Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data.
- Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems.
In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, somehow cache the results, and then write all the changed data back to the target entities in a separate writing pass. All systems would act on the data in the state that it was in at the beginning of the frame, performance-wise cheapand then before the end of the frame, write phasewhen all systems have finished updating, attributes of entities that needed to be modifieda serialized writing pass happens where the cached results from all the different systems are finallyiterated through and written back to the target entities.
This is based on the (maybe wrong?) idea that the easy parallelization win could be big enough to outdo the cost (both in terms of runtime performance as well a code overhead) of the result caching and the writing pass.
How might such a system be implemented to achieve optimal performance, as well as making programmer life easier? What are the implementation details of such a system and what might have to be changed inare the existing ECprerequisites for an Entity-architectureComponent system that wants to accommodateuse this solution?