186

What's the most efficient way to test if an array contains any element from a second array?

Two examples below, attempting to answer the question does foods contain any element from cheeses:

cheeses = %w(chedder stilton brie mozzarella feta haloumi reblochon)
foods = %w(pizza feta foods bread biscuits yoghurt bacon)

puts cheeses.collect{|c| foods.include?(c)}.include?(true)

puts (cheeses - foods).size < cheeses.size

6 Answers 6

314
(cheeses & foods).empty?

As Marc-André Lafortune said in comments, & works in linear time while any? + include? will be quadratic. For larger sets of data, linear time will be faster. For small data sets, any? + include? may be faster as shown by Lee Jarvis' answer -- probably because & allocates a new Array while another solution does not and works as a simple nested loop to return a boolean.

Sign up to request clarification or add additional context in comments.

8 Comments

When checking if an array contains an element from another array, wouldn't it make more sense to do (cheeses & foods).any? as this returns a true value if the arrays' do in fact contain any of the same elements?
@RyanFrancis, docs: any?: The method returns true if the block ever returns a value other than false or nil. empty?: Returns true if self contains no elements.
@Nakilon I'm also confused why answer isn't (cheeses & foods).any? isn't the OP's question: if any foods are in cheeses? In his example, "feta" is in both, so result should be true, right? So why check .empty? on the intersection?
@SuckerForMayhem, because OP's question is "If any are ...?", not just "If any?". If "are ..." is omitted it's assumed to be "If any is True?" and would return False for array like [false, false, false], while it obviously is not empty.
Is there any implementation in activerecord level?
|
47

How about Enumerable#any?

>> cheeses = %w(chedder stilton brie mozzarella feta haloumi)
=> ["chedder", "stilton", "brie", "mozzarella", "feta", "haloumi"]
>> foods = %w(pizza feta foods bread biscuits yoghurt bacon)
=> ["pizza", "feta", "foods", "bread", "biscuits", "yoghurt", "bacon"]
>> foods.any? {|food| cheeses.include?(food) }
=> true

Benchmark script:

require "benchmark"
N = 1_000_000
puts "ruby version: #{RUBY_VERSION}"

CHEESES = %w(chedder stilton brie mozzarella feta haloumi).freeze
FOODS = %w(pizza feta foods bread biscuits yoghurt bacon).freeze

Benchmark.bm(15) do |b|
  b.report("&, empty?") { N.times { (FOODS & CHEESES).empty? } }
  b.report("any?, include?") { N.times { FOODS.any? {|food| CHEESES.include?(food) } } }
end

Result:

ruby version: 2.1.9
                      user     system      total        real
&, empty?         1.170000   0.000000   1.170000 (  1.172507)
any?, include?    0.660000   0.000000   0.660000 (  0.666015)

3 Comments

You can improve this by turning cheeses into a set.
Ran my own benchmark here on ruby 2.2.7 and 2.3.4 and any?, include? was the fastest, set disjoint the slowest: gist.github.com/jaredmoody/d2a1e83de2f91fd6865920cd01a8b497
This benchmark is biased by the specific example mentioned and does not necessarily hold in a more general case. What if there were no common elements between the two arrays? What if the arrays were in a different order on each pass? What if feta appeared at the end of both arrays? As Marc-André stated, set intersection executes in linear time, so it makes sense that it is much more scalable for the general case, rather than the one specific example used purely to clarify the question.
27

You can check if the intersection is empty.

cheeses = %w(chedder stilton brie mozzarella feta haloumi)
foods = %w(pizza feta foods bread biscuits yoghurt bacon)
foods & cheeses
=> ["feta"] 
(foods & cheeses).empty?
=> false

Comments

7

you can use intersect? since ruby 3.1

foods.intersect?(cheeses)

Comments

3
require "benchmark"
N = 1_000_000
puts "ruby version: #{RUBY_VERSION}"

CHEESES = %w(chedder stilton brie mozzarella feta haloumi).freeze
FOODS = %w(pizza feta foods bread biscuits yoghurt bacon).freeze

Benchmark.bm(15) do |b|
  b.report("&, empty?") { N.times { (FOODS & CHEESES).empty? } }  
  b.report("any?, include?") { N.times { FOODS.any? {|food| CHEESES.include?(food) } } }  
  b.report("disjoint?") { N.times { FOODS.to_set.disjoint? CHEESES.to_set }}
end  
                      user     system      total        real
&, empty?         0.751068   0.000571   0.751639 (  0.752745)
any?, include?    0.408251   0.000133   0.408384 (  0.408438)
disjoint?        11.616006   0.014806  11.630812 ( 11.637300)

Comments

1
Set.new(cheeses).disjoint? Set.new(foods)

3 Comments

Also in my (unscientific) benchmark, set disjoint was significantly slower than other methods: gist.github.com/jaredmoody/d2a1e83de2f91fd6865920cd01a8b497
Thanks for your comments. I'm not sure why it wasn't Set.new but I just edited it. I tried your performance benchmarks in 2.4.1. Mine did better but still not best using disjointed sets containing more words. I put my version in a comment on your gist. I also think disjoint? is very elegant, especially compared to "any?, include?". The original question did ask about both elegant and efficient.
.to_set method can be useful here cheeses.to_set.disjoint?(foods.to_set)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.