0

I have a problem to determine whether a object contains subset of given properties. For instance, I want to look for a object has property of a,b,c,d.. M within N number of objects.

for example

search a,b,c,d    object A - e,g,a,c
                  object B - a,b,c
                  object C - d,c,b
                  object D - a,b,c,d,e

would return object B and object C.

The most straight forward solution is check every single object and see if it has property of a,b,c..M. The worst case would be O(mn) since I need to go through all object and check all property a,b,c..M. You can assume N is quite big, and running time will increase cruelly if M increase. Is there any other efficient way to solve this problem? Thanks

4
  • Why do object A and D not contain a subset of your search? Define characteristics of 'subset' ? Commented Jun 27, 2011 at 1:36
  • Isn't your worst case is O(Nmn) if n is number of properties in an object? Commented Jun 27, 2011 at 1:41
  • @Tungano: go to wiki and search what "subset" means. By definition, a set A is a subset of a set B if A is "contained" inside B. Therefore, object A (a,c,e,g) is not subset of search (a,b,c,d). Commented Jun 27, 2011 at 4:11
  • @Terminal: think about iteration. There are N number of object that contains R number of properties. Additionally, to search M number of properties on each object requires at most M times (we can ignore R > M because it doesn't meet "subset"). Therefore, upper bound is O(NM). Commented Jun 27, 2011 at 4:15

3 Answers 3

1

Unlikely it's possible to make something faster than O(MN).

The problem is that to find the answer we have to analyse all objects and [potentially] all their properties.

OTOH it may be (not sure) possible to make some preprocessing in O(MN) or a bit more and then be able to answer faster...

Sign up to request clarification or add additional context in comments.

Comments

1
  1. First convert your set that you are testing into a dictionary.

  2. For each, set check if each element of that set is contained in the dictionary.

This requires O(n + m) time, where n is the number of elements in the first set, and m is the total number of elements in all other sets.

Here is a solution in python:

def list_sets(search, objects):
    d = dict( [ (x, True) for x in search ] )
    return [ x for x in objects if all( [ y in d for y in x  ] ) ]

Example input:

print list_sets( [ 'a', 'b', 'c', 'd' ],
    [   [ 'e', 'g', 'a', 'c' ],
        [ 'a', 'b', 'c' ],
        [ 'd', 'c', 'b' ],
        [ 'a', 'b', 'c', 'd', 'e' ] ] )

Result:

[['a', 'b', 'c'], ['d', 'c', 'b']]

Comments

0

You could do a first pass to reject any object whose property set size exceeded the size of the search set.

This doesn't improve worst case performance, but could improve real life performance depending upon your data.

1 Comment

actually I already did it, both matching in the middle and at first. But as you said, it doesn't improve worst case. I need to improve upper bound.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.