Skip to main content
reworked question, I think it was a bit too verbose before
Source Link
TravisG
  • 4.4k
  • 5
  • 40
  • 63

In realitythe case of the shown MovementSystem, parallelization is trivial. Since entities don't depend on each other, and don't modify shared data, we could just move all entities in parallel.

However, these systems sometimes require that entities interact with (/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimesoften between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input).

Now, when trying to parallelize the update phases of entity/component systemsFor example, the phases in which data (components/attributes) from Entitiesa physics system sometimes entities may interact with each other. Two objects collide, their positions, velocities and other attributes are read and used to compute somethingfrom them, are updated, and the phase wherethen the modified data isupdated attributes are written back to both entities need to be separated in order to avoid data races.

OtherwiseAnd before the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts ofrendering system in the update process that depend on other parts. This seems ugly. To meengine can start rendering entities, it would seem more eleganthas to be ablewait for other systems to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modificationscomplete execution to that data back until some later point.

The fact that this is even possible is based on the assumptionensure that modification write-backs are usually very small in complexity, and don't require much performance, whereas computationsall relevant attributes are very expensive (relatively). So the overhead added by a delayed-write phase mightwhat they need to be evened out by more efficient updating of entities (by having threads work more % of the time instead of waiting).

A concrete example ofIf we try to blindly parallelize this might be a system that updates physics. The system needs, it will lead to bothclassical race conditions where different systems may read and write a lot ofmodify data to and from entitiesat the same time.

OptimallyIdeally, there would beexist a system in placesolution where all available threads update a subset of allsystems may read data from any entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaroundit wishes to, we would havewithout having to findworry about other systems to run in parallel (which don't modify themodifying that same data asat the physics system)same time, other wiseand without having the remaining threads are waitingprogrammer care about properly ordering the execution and wasting timeparallelization of these systems manually (which may sometimes not even be possible).

HoweverIn a basic implementation, that has disadvantagesthis could be achieved by just putting all data reads and writes in critical sections (guarding them with mutexes). But this induces a large amount of runtime overhead and is probably not suitable for performance sensitive applications.

  1. Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data.
  2. Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems.

In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, somehow cache the results, and then write all the changed data back to the target entities in a separate writing pass. All systems would act on the data in the state that it was in at the beginning of the frame, performance-wise cheapand then before the end of the frame, write phasewhen all systems have finished updating, attributes of entities that needed to be modifieda serialized writing pass happens where the cached results from all the different systems are finallyiterated through and written back to the target entities.

This is based on the (maybe wrong?) idea that the easy parallelization win could be big enough to outdo the cost (both in terms of runtime performance as well a code overhead) of the result caching and the writing pass.

How might such a system be implemented to achieve optimal performance, as well as making programmer life easier? What are the implementation details of such a system and what might have to be changed inare the existing ECprerequisites for an Entity-architectureComponent system that wants to accommodateuse this solution?

In reality, these systems sometimes require that entities interact(/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimes between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input).

Now, when trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races.

Otherwise the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts of the update process that depend on other parts. This seems ugly. To me it would seem more elegant to be able to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modifications to that data back until some later point.

The fact that this is even possible is based on the assumption that modification write-backs are usually very small in complexity, and don't require much performance, whereas computations are very expensive (relatively). So the overhead added by a delayed-write phase might be evened out by more efficient updating of entities (by having threads work more % of the time instead of waiting).

A concrete example of this might be a system that updates physics. The system needs to both read and write a lot of data to and from entities.

Optimally, there would be a system in place where all available threads update a subset of all entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaround, we would have to find other systems to run in parallel (which don't modify the same data as the physics system), other wise the remaining threads are waiting and wasting time.

However, that has disadvantages

  1. Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data.
  2. Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems.

In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, and then in a separate, performance-wise cheap, write phase, attributes of entities that needed to be modified are finally written back to the entities.

How might such a system be implemented to achieve optimal performance, as well as making programmer life easier? What are the implementation details of such a system and what might have to be changed in the existing EC-architecture to accommodate this solution?

In the case of the shown MovementSystem, parallelization is trivial. Since entities don't depend on each other, and don't modify shared data, we could just move all entities in parallel.

However, these systems sometimes require that entities interact with (read/write data from/to) each other, sometimes within the same system, but often between different systems that depend on each other.

For example, in a physics system sometimes entities may interact with each other. Two objects collide, their positions, velocities and other attributes are read from them, are updated, and then the updated attributes are written back to both entities.

And before the rendering system in the engine can start rendering entities, it has to wait for other systems to complete execution to ensure that all relevant attributes are what they need to be.

If we try to blindly parallelize this, it will lead to classical race conditions where different systems may read and modify data at the same time.

Ideally, there would exist a solution where all systems may read data from any entities it wishes to, without having to worry about other systems modifying that same data at the same time, and without having the programmer care about properly ordering the execution and parallelization of these systems manually (which may sometimes not even be possible).

In a basic implementation, this could be achieved by just putting all data reads and writes in critical sections (guarding them with mutexes). But this induces a large amount of runtime overhead and is probably not suitable for performance sensitive applications.

In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, somehow cache the results, and then write all the changed data back to the target entities in a separate writing pass. All systems would act on the data in the state that it was in at the beginning of the frame, and then before the end of the frame, when all systems have finished updating, a serialized writing pass happens where the cached results from all the different systems are iterated through and written back to the target entities.

This is based on the (maybe wrong?) idea that the easy parallelization win could be big enough to outdo the cost (both in terms of runtime performance as well a code overhead) of the result caching and the writing pass.

How might such a system be implemented to achieve optimal performance? What are the implementation details of such a system and what are the prerequisites for an Entity-Component system that wants to use this solution?

added 3713 characters in body
Source Link
TravisG
  • 4.4k
  • 5
  • 40
  • 63

Setup

WhenI have an entity-component architecture where Entities can have a set of attributes (which are pure data with no behavior) and there exist systems that run the entity logic which act on that data. Essentially, in somewhat pseudo-code:

Entity
{
    id;
    map<id_type, Attribute> attributes;
}

System
{
    update();
    vector<Entity> entities;
}

A system that just moves along all entities at a constant rate might be

MovementSystem extends System
{
   update()
   {
      for each entity in entities
        position = entity.attributes["position"];
        position += vec3(1,1,1);
   }
}

Essentially, I'm trying to parallelise update() as efficiently as possible. This can be done by running entire systems in parallel, or by giving each update() of one system a couple of components so different threads can execute the update of the same system, but for a different subset of entities registered with that system.

Problem

In reality, these systems sometimes require that entities interact(/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimes between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input).

Now, when trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races.

How can thisOtherwise the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts of the update process that depend on other parts. This seems ugly. To me it would seem more elegant to be achievedable to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modifications to that data back until some later point.

The fact that this is even possible is based on the mostassumption that modification write-backs are usually very small in complexity, and don't require much performance, whereas computations are very expensive (relatively). So the overhead added by a delayed-write phase might be evened out by more efficient updating of entities (in termsby having threads work more % of execution performancethe time instead of waiting) and elegant manner? Given.

A concrete example of this might be a parallel architecturesystem that allows meupdates physics. The system needs to hand each threadboth read and write a setlot of entitiesdata to and from entities.

Optimally, there would be a system in place where all available threads update a subset of all entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaround, whatwe would have to find other systems are available to separate read/modify/writerun in parallel (e.g. Messaging?which don't modify the same data as the physics system), other wise the remaining threads are waiting and wasting time.

However, that has disadvantages

  1. Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data.
  2. Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems.

Solution?

In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, and then in a separate, performance-wise cheap, write phase, attributes of entities that needed to be modified are implementation detailsfinally written back to make themthe entities.

The Question

How might such a system be implemented to achieve optimal performance, as fastwell as possiblemaking programmer life easier? What are the implementation details of such a system and what might have to be changed in the existing EC-architecture to accommodate this solution?

When trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races.

How can this be achieved in the most efficient (in terms of execution performance) and elegant manner? Given a parallel architecture that allows me to hand each thread a set of entities to update, what systems are available to separate read/modify/write (e.g. Messaging?) and what are implementation details to make them as fast as possible?

Setup

I have an entity-component architecture where Entities can have a set of attributes (which are pure data with no behavior) and there exist systems that run the entity logic which act on that data. Essentially, in somewhat pseudo-code:

Entity
{
    id;
    map<id_type, Attribute> attributes;
}

System
{
    update();
    vector<Entity> entities;
}

A system that just moves along all entities at a constant rate might be

MovementSystem extends System
{
   update()
   {
      for each entity in entities
        position = entity.attributes["position"];
        position += vec3(1,1,1);
   }
}

Essentially, I'm trying to parallelise update() as efficiently as possible. This can be done by running entire systems in parallel, or by giving each update() of one system a couple of components so different threads can execute the update of the same system, but for a different subset of entities registered with that system.

Problem

In reality, these systems sometimes require that entities interact(/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimes between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input).

Now, when trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races.

Otherwise the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts of the update process that depend on other parts. This seems ugly. To me it would seem more elegant to be able to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modifications to that data back until some later point.

The fact that this is even possible is based on the assumption that modification write-backs are usually very small in complexity, and don't require much performance, whereas computations are very expensive (relatively). So the overhead added by a delayed-write phase might be evened out by more efficient updating of entities (by having threads work more % of the time instead of waiting).

A concrete example of this might be a system that updates physics. The system needs to both read and write a lot of data to and from entities.

Optimally, there would be a system in place where all available threads update a subset of all entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaround, we would have to find other systems to run in parallel (which don't modify the same data as the physics system), other wise the remaining threads are waiting and wasting time.

However, that has disadvantages

  1. Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data.
  2. Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems.

Solution?

In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, and then in a separate, performance-wise cheap, write phase, attributes of entities that needed to be modified are finally written back to the entities.

The Question

How might such a system be implemented to achieve optimal performance, as well as making programmer life easier? What are the implementation details of such a system and what might have to be changed in the existing EC-architecture to accommodate this solution?

Tweeted twitter.com/#!/StackGameDev/status/374548959160971264
Source Link
TravisG
  • 4.4k
  • 5
  • 40
  • 63

Efficiently separating Read/Compute/Write steps for concurrent processing of entities in Entity/Component systems

When trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races.

How can this be achieved in the most efficient (in terms of execution performance) and elegant manner? Given a parallel architecture that allows me to hand each thread a set of entities to update, what systems are available to separate read/modify/write (e.g. Messaging?) and what are implementation details to make them as fast as possible?