4

I'm working with some large datasets, and trying to improve performance. I need to determine whether an object is contained in an array. I was considering using either index or include?, so I benchmarked both.

require 'benchmark'

a = (1..1_000_000).to_a
num = 100_000
reps = 100

Benchmark.bmbm do |bm|
  bm.report('include?') do
    reps.times { a.include? num }
  end
  bm.report('index') do
    reps.times { a.index num }
  end
end

Surprisingly (to me), index was considerably faster.

               user     system      total        real
include?   0.330000   0.000000   0.330000 (  0.334328)
index      0.040000   0.000000   0.040000 (  0.039812)

Since index provides more information than include?, I would have expected it to be slightly slower if anything, although this was not the case. Why is it faster?

(I know that index comes directly from the array class, and include? is inherited from Enumerable. Might that explain it?)

4
  • Nice catch. I think you should file an issue to MRI. Commented Sep 11, 2014 at 6:02
  • Related: stackoverflow.com/questions/23729175/… Commented Sep 11, 2014 at 6:15
  • @undur_gongor Thanks. And done. Commented Sep 11, 2014 at 6:35
  • 2
    @Sparhawk: The issue has been fixed since. Commented Sep 22, 2014 at 8:57

3 Answers 3

5

Looking at the Ruby MRI source, it seems that index uses the optimized rb_equal_opt while include? uses rb_equal. This can be seen in rb_ary_includes and rb_ary_index. Here is the commit that made the change. Its not immediately clear to me why its used in index and not include?

You might also find it interesting to read the discussion of this feature

Sign up to request clarification or add additional context in comments.

Comments

1

I ran the same test for benchmarking. Seems like include? is faster than index, though it is not very consistent. Here are my results for two different scenarios.

  1. ruby code is same as yours
               user     system      total        real
index      0.065803   0.000652   0.066455 (  0.067181)
include?   0.065551   0.000590   0.066141 (  0.066894)
  1. Another benchmark
                   user     system      total        real
    index      0.000034   0.000005   0.000039 (  0.000037)
    include?   0.000017   0.000001   0.000018 (  0.000017)

code:

require 'benchmark'

# parse ranks and return number of reports to using index
def solution_using_index(ranks)
  return 0 if ranks.nil? || ranks.empty? || ranks.length <= 1
  return ((ranks[0] - ranks[1] == 1) || (ranks[1] - ranks[0] == 1) ?  1 : 0) if ranks.length == 2
  return 0 if ranks.max > 1000000000 || ranks.min < 0

  grouped_ranks = ranks.group_by(&:itself)
  report_to, rank_keys= 0, grouped_ranks.keys
  rank_keys.each {|rank| report_to += grouped_ranks[rank].length if rank_keys.index(rank+1) }
  report_to
end

# parse ranks and return number of reports to using include
def solution_using_include(ranks)
  return 0 if ranks.nil? || ranks.empty? || ranks.length <= 1
  return ((ranks[0] - ranks[1] == 1) || (ranks[1] - ranks[0] == 1) ?  1 : 0) if ranks.length == 2
  return 0 if ranks.max > 1000000000 || ranks.min < 0

  grouped_ranks = ranks.group_by(&:itself)
  report_to, rank_keys= 0, grouped_ranks.keys
  rank_keys.each {|rank| report_to += grouped_ranks[rank].length if rank_keys.include?(rank+1) }
  report_to
end

test_data = [[3, 4, 3, 0, 2, 2, 3, 0, 0], [4, 4, 3, 3, 1, 0], [4, 2, 0] ]

Benchmark.bmbm do |bm|
  bm.report('index') do
    test_data.each do |ranks|
      reports_to = solution_using_index(ranks)
    end
  end
  bm.report('include?') do
    test_data.each do |ranks|
      reports_to = solution_using_include(ranks)
    end
  end
end

2 Comments

Your results seem internally consistent to me? include? is always faster. However, did you read the comments? The bug has been fixed, hence your results.
yes, it was misleading for me, so wanted to justify it. I tested it with 2.6.0, 2.6.0preview1, 2.6.0preview2, 2.6.0preview3 and its consistent.
0

If performance is your goal, you should be using Array#bsearch which traverses the array using binary search.

https://ruby-doc.org/core-2.7.0/Array.html#method-i-bsearch

a.bsearch {|a| num <=> a }

It smokes bothindex and include

Rehearsal --------------------------------------------
include?   0.108172   0.000805   0.108977 (  0.112928)
index      0.122730   0.000502   0.123232 (  0.126323)
bsearch    0.000254   0.000027   0.000281 (  0.000354)
----------------------------------- total: 0.232490sec

               user     system      total        real
include?   0.106727   0.000036   0.106763 (  0.108495)
index      0.107732   0.000330   0.108062 (  0.110272)
bsearch    0.000201   0.000008   0.000209 (  0.000206)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.