1
data = [[0,1], [1,6,10], [], [1,2,4,5], [7,8], [], [], [8], [2], [0,3], [9]]

Given the above 2d array, I need to select five arrays that give me the most unique number.

For example

# returns 11 (optimal output,  the number of subclasses)
(data[1] | data[3] | data[4] | data[9] | data[10]).length 
# returns 10 (less optimal output)
(data[0] | data[1] | data[3] | data[4] | data[10]).length 

Doing it the brute force way is taking way too much time to complete. Is there any other suggestion?

2
  • can you please explain it more clear Commented Jan 23, 2017 at 15:39
  • 2
    Does "most unique" mean "least duplication"? This is a permutation problem, so it's not going to be terribly efficient. There's no algorithms that magically solve this in a general case. Commented Jan 23, 2017 at 16:01

3 Answers 3

4

Here's something that does it:

data = [[0,1], [1,6,10], [], [1,2,4,5], [7,8], [], [], [8], [2], [0,3], [9]]

best = data.combination(5).max_by do |combo|
  combo.flatten.uniq.length
end

best
# => [[1, 6, 10], [1, 2, 4, 5], [7, 8], [0, 3], [9]]
best.flatten.uniq.length
# => 11

It doesn't take long to compute, and there's probably better ways of optimizing that inner loop if you're prepared to use Benchmark for testing.

If you need orders of magnitude better performance, maybe a C++ library linked in via FFI is the answer.

If you're dealing with numbers that are relatively small, like in the range of 0..31 or even 0..63, then you could do this with bitmasks. That would reduce each array to a single value, and combining values with OR is trivial in terms of compute. Counting the number of bits in a given value is likewise pretty simple.

Sign up to request clarification or add additional context in comments.

4 Comments

There are 12 numbers in the result, but only 11 unique numbers (1 occurs twice).
BTW, I think you (only) need combination, not permutation.
@Stefan Great point, and it does run a lot faster that way. I didn't notice the duplication either, so that's also addressed.
@kouty I've updated the answer with Stefan's observations.
2

Here's a greedy algorithm.

For each iteration, it simply takes the subarray with the most new elements. It works for your example, but might be off by a few elements for more complex examples.

For large arrays and large n, it should be much faster than any solution using combination.

You didn't provide any code, so I'll leave it as an exercise to look for counterexamples ;).

data = [[0, 1], [1, 6, 10], [], [1, 2, 4, 5], [7, 8], [], [], [8], [2], [0, 3], [9]]

def trim(array, already_taken)
  array.map { |sub_array| sub_array - already_taken }.reject(&:empty?)
end

def find_best_cover(array, n)
  array = array.map{ |subarray| subarray.uniq }
  Array.new(n) do
    next_best = array.max_by { |subarray| subarray.size }
    array = trim(array, next_best)
    next_best
  end
end

p find_best_cover(data, 5).flatten
#=> [1, 2, 4, 5, 6, 10, 7, 8, 0, 3, 9]

Comments

1

You can reduce computation time by reducing your data array.

Initially, there are 462 combinations:

data.combination(5).size
#=> 462

Deleting empty arrays reduces this to 56:

data.reject!(&:empty)

data.combination(5).size
#=> 56

And deleting arrays that are fully contained in other arrays results in merely 6 combinations:

data -= [[2], [8]]

data.combination(5).size
#=> 6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.