Algorithm for searching 2d array

Question

data = [[0,1], [1,6,10], [], [1,2,4,5], [7,8], [], [], [8], [2], [0,3], [9]]

Given the above 2d array, I need to select five arrays that give me the most unique number.

For example

# returns 11 (optimal output,  the number of subclasses)
(data[1] | data[3] | data[4] | data[9] | data[10]).length 
# returns 10 (less optimal output)
(data[0] | data[1] | data[3] | data[4] | data[10]).length

Doing it the brute force way is taking way too much time to complete. Is there any other suggestion?

Does "most unique" mean "least duplication"? This is a permutation problem, so it's not going to be terribly efficient. There's no algorithms that magically solve this in a general case. — tadman
– tadman, Commented Jan 23, 2017 at 16:01

tadman · Accepted Answer · 2017-01-23 16:47:53Z

4

Here's something that does it:

data = [[0,1], [1,6,10], [], [1,2,4,5], [7,8], [], [], [8], [2], [0,3], [9]]

best = data.combination(5).max_by do |combo|
  combo.flatten.uniq.length
end

best
# => [[1, 6, 10], [1, 2, 4, 5], [7, 8], [0, 3], [9]]
best.flatten.uniq.length
# => 11

It doesn't take long to compute, and there's probably better ways of optimizing that inner loop if you're prepared to use Benchmark for testing.

If you need orders of magnitude better performance, maybe a C++ library linked in via FFI is the answer.

If you're dealing with numbers that are relatively small, like in the range of 0..31 or even 0..63, then you could do this with bitmasks. That would reduce each array to a single value, and combining values with OR is trivial in terms of compute. Counting the number of bits in a given value is likewise pretty simple.

edited Jan 23, 2017 at 16:47

answered Jan 23, 2017 at 16:06

tadman

212k23 gold badges237 silver badges266 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Stefan Over a year ago

There are 12 numbers in the result, but only 11 unique numbers (1 occurs twice).

Stefan Over a year ago

BTW, I think you (only) need combination, not permutation.

tadman Over a year ago

@Stefan Great point, and it does run a lot faster that way. I didn't notice the duplication either, so that's also addressed.

tadman Over a year ago

@kouty I've updated the answer with Stefan's observations.

Eric Duminil · Accepted Answer · 2017-01-23 20:30:48Z

Here's a greedy algorithm.

For each iteration, it simply takes the subarray with the most new elements. It works for your example, but might be off by a few elements for more complex examples.

For large arrays and large n, it should be much faster than any solution using combination.

You didn't provide any code, so I'll leave it as an exercise to look for counterexamples ;).

data = [[0, 1], [1, 6, 10], [], [1, 2, 4, 5], [7, 8], [], [], [8], [2], [0, 3], [9]]

def trim(array, already_taken)
  array.map { |sub_array| sub_array - already_taken }.reject(&:empty?)
end

def find_best_cover(array, n)
  array = array.map{ |subarray| subarray.uniq }
  Array.new(n) do
    next_best = array.max_by { |subarray| subarray.size }
    array = trim(array, next_best)
    next_best
  end
end

p find_best_cover(data, 5).flatten
#=> [1, 2, 4, 5, 6, 10, 7, 8, 0, 3, 9]

Stefan · Accepted Answer · 2017-01-23 16:39:07Z

1

You can reduce computation time by reducing your data array.

Initially, there are 462 combinations:

data.combination(5).size
#=> 462

Deleting empty arrays reduces this to 56:

data.reject!(&:empty)

data.combination(5).size
#=> 56

And deleting arrays that are fully contained in other arrays results in merely 6 combinations:

data -= [[2], [8]]

data.combination(5).size
#=> 6

answered Jan 23, 2017 at 16:39

Stefan

115k14 gold badges157 silver badges233 bronze badges

Collectives™ on Stack Overflow

Algorithm for searching 2d array

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related