Algorithm / Data Structure for Finding Which of Many Sets are Subsets of another Set

Question

Abstract Description:

I have a set of strings, call it the "active set", and a set of sets of strings - call that the "possible set". When a new string is added to the active set, sets from the possible set may suddenly be subsets of the active set because the active set lacked only that string to be a superset of one of the possibles. I need an algorithm to efficiently find these when I add a new string to the active set. Bonus points if the same data structure allows me to efficiently find which of these possible sets are invalidated (no longer a subset) when a string is removed from the active set.

(The reason I framed the problem described below in terms of sets and subsets of strings in the abstract above is that the language I'm writing this in (Io) is dynamically typed. Objects do have a "type" field but it is a string with the name of the object type in it.)

Background:

In my game engine I have GameObjects which can have several types of Representation objects added to them. For instance if a GameObject has physical presence it might have a PhysicsRepresentation added to it (or not if it's not a solid object). It might have various kinds of GraphicsRepresentations added to it, such as a mesh or particle effect (and you can have more than one if you have multiple visual effects attached to the same game object).

The point of this is to separate subsystems, but you can't completely separate everything: for instance when a GameObject has both a PhysicsRepresentation and a GraphicsRepresentation, something needs to create a 3rd object which connects the position of the GraphicsRepresentation to the location of the PhysicsRepresentation. To serve this purpose while still keeping all the components separate, I have Interaction objects. The Interaction object encapsulates the cross-cutting knowledge about how two system components have to interact.

But in order to protect GameObject from having to know too much about Representations and Interactions, GameObject just provides a generic registry where Interaction prototype objects can register to be called when a particular combination of Representations is present in the GameObject. When a new Representation is added to the GameObject, GameObject should look in it's registry and activate just those Interaction objects which are newly enabled by the presence of the new Representation, plus the existing Representations.

I'm just stuck on what data structure should be used for this registry and how to search it.

Errata:

The sets of strings are not necessarily sorted, but I can choose to store them sorted.

Although an Interaction most commonly will be between two Representations, I do not want to limit it to that; I should be able to have Interactions that trigger with 3 or more different representations, or even interactions that trigger based on just 1 representation.

I want to optimize this for the case of making it as fast as possible to add/remove representations.

I will have many active sets (each game object has an active set), but I have only one possible set (the set of all registered interaction types). So I don't care how long it takes to build the data structure that represents the possible set, because it only needs to be done once provided the algorithm for comparing different active sets is non-destructive of the possible set data structure.

Does it really matter how quickly you can add and remove representations? How often are these really going to change while playing the game? A dozen times at startup as representations are added, then once each time the user hits the button labeled Forward display to my phone and then Play on the computer again ? I'd have to assume that routing events up and down the layers will happen millions of times for each add / remove representation layer... — sarnold
– sarnold, Commented Dec 31, 2011 at 3:36
@sarnold - Whenever a GameObject is created or destroyed, multiple representations are added / removed. So every time a new bullet spawns, every time a spell is cast, etc., you could have 3 or 4 representations added when the game object is created and removed when the game object is destroyed (so that the interaction objects can do cleanup if necessary). — Dennis
– Dennis, Commented Dec 31, 2011 at 4:50
@dasblinkenlight - The cardinality of my sets is small. I don't know in advance how many kinds there will ever be, but right now there will be about 4 different kinds of representation and 4 different kinds of interaction. So yeah (as sarnold said) it might not matter that much. — Dennis
– Dennis, Commented Dec 31, 2011 at 4:52
But I'm actually still interested in this problem in the abstract, because after this I have plans to do a larger system this way. It will be a planning AI which would use the same architecture of objects + interactions; basically you would tell the planning AI "I have these objects which support these interactions, figure out some ways to combine them". The cardinality of those sets would be much higher; but I wouldn't have to find all interactions in the planning AI case, just some. — Dennis
– Dennis, Commented Dec 31, 2011 at 4:55

Sergey Kalinichenko · Accepted Answer · 2011-12-31 05:11:53Z

4

If your sets are really small, the best representation is using bit sets. First, you build a map from strings to consecutive integers 0..N, where N is the number of distinct strings. Then you build your sets by bitwise OR-ing of 1<<k into a number. This lets you turn your set operations into bitwise operations, which are extremely fast (an intersection is an &; a union is an |, and so on).

Here is an example: Let's say you have two sets, A={quick, brown, fox} and B={brown, lazy, dog}. First, you build a string-to-number map, like this:

quick - 0
brown - 1
fox   - 2
lazy  - 3
dog   - 4

Then your sets would become A=00111b and B=11010b. Their intersection is A&B = 00010b, and their union is A|B = 11111b. You know a set X is a subset of set Y if X == X&Y.

answered Dec 31, 2011 at 5:11

Sergey Kalinichenko

729k85 gold badges1.2k silver badges1.6k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dennis Over a year ago

I was really torn whether to mark your answer as the accepted one, or mcdowella's. Your solution is simpler and is clearly the best choice for small N. Ultimately I decided to mark mcdowella's answer because what I really wanted to know was if there is a way to do it that doesn't involve testing each one, and his link to the Rete algorithm is the general algorithm I was hoping for. I just wanted to let you know I could just as easily consider this one the accepted answer too.

Sergey Kalinichenko Over a year ago

@Dennis Using Rete for a set of four..eight objects sounds like an overkill of great proportions: hitting a fly with a sledgehammer would look notmal next to it :) I tried implementing Rete once (as a fun exercise) and I quickly learned that it is complex. I mean, seriously complex. Anyway, good luck!

Dennis Over a year ago

I think you're right; I changed the accepted answer to this one; it wouldn't be fair to present the actual simple case I have to solve and then reject the answer because it isn't the right big-O complexity for the general case. I thought of bit flags in integers too before I asked the question, so I guess this is confirmation that that was the right track all along.

mcdowella · Accepted Answer · 2011-12-31 05:14:50Z

2

One way to do this would be to keep, for each subset, a count of how many of its strings were not in the main set, and a map from strings to lists of subsets containing that string, so that you can update the counts when you add or remove a new string to the active set, and notice when a count goes down to zero.

This problem reminds me of firing rules in a rule-based system when a fact becomes true, which corresponds to a new string being added to the active set. Many of these systems use http://en.wikipedia.org/wiki/Rete_algorithm. http://www.jboss.org/drools/drools-expert.html is an open source rule-based system - although it looks like there is a lot of enterprise system wrapping round it these days.

answered Dec 31, 2011 at 5:14

mcdowella

19.6k2 gold badges21 silver badges25 bronze badges

1 Comment

Dennis Over a year ago

Thank you for the link to the Rete algorithm. It's probably overkill for simple case but I may want to use it in the future for the general case.

Collectives™ on Stack Overflow

Algorithm / Data Structure for Finding Which of Many Sets are Subsets of another Set

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related