2

I'm new to ruby, and clearly see and find online that the following fails:

arr = [10, 20, 30, 40]     
arr.each.with_index do |elmt, i|
  print "#{elmt}, #{i}, "
  arr.delete_at(i) if elmt == 20
  puts arr.length
end

Clearly the delete_at is interacting with the iterator, but I cannot find a clear description of how the iterator and delete_at work such that this is so. (BTW, I understand solutions that work - I'm not looking for a correct way to do this, I'm trying to understand the semantics such that I know why this is not doing what's expected.) For completeness, here's the output

10, 0, 4
20, 1, 3
40, 2, 3
=> [10, 30, 40]
2
  • 1
    Changing an Array (or generally any Enumerable) while iterating over it produces undefined behaviour (at least in MRI). To cite Matz (the creator of Ruby): "It is undefined behavior of iterators if the block modifies iterating object. It may crash or fall in to infinite loop." Commented Mar 5, 2015 at 13:02
  • @Holger good info. I wish this was in the documentation for the relevant methods. Dare I go so far as to call it a doc bug? Commented Mar 5, 2015 at 16:53

3 Answers 3

1

lightbulb!!

It seems clear the delete_at immediately adjusts the underlying data structure, left shifting all elements to the right of the deleted item. The array seems more like a linked list than an array. So the "next" item (here "30") is now arr[1] (which the iterator has already processed), and when the iterator increments to arr[2], it sees "40". arr[3] returns nil and puts seems to do nothing with nil.

Sign up to request clarification or add additional context in comments.

5 Comments

arr[3] would return nil, but if you add another puts statement you can see that the iteration never actually reaches arr[3], because the length of the array is less than 4 by the time the Ruby runtime gets to that point. I made a version of your code that prints indices as it runs: gist.github.com/DavidEGrayson/31250dfa0a0dd8b443ed
@David Right - printing the indices makes it clearer. I'll update the question. Interesting point regarding the run time seeing the length at the top of the block and not entering the block for the last loop.
Hey, IMO, it will be clear for anyone who is familiar with how yielding values from self to blocks works. I mean while yielding to a block, it doesn't have to know if self has changed or not, just passes to the block what's at self[i] at the moment at every yield. But I believe many newcomers will overlook this and others would just use Array#reject, Array#select, and Array#delete_if.
@llkin - so I have a todo on my list of learning - understand "yielding values from self to blocks". Any great summary you'd point me to?
@user3546411 : Not that necessary (but helps and good to learn). What you said in the answer is true, I just said it from the Ruby side. The values are passed to the blocks iteratively from the original array, delete_at modifies the original array, and there is no interaction between the two.
1

look at: https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/include/ruby/intern.h

https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/enumerator.c

https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/array.c

each gives you an enumerator, and the with_index works with the enumerator.

when you reach element 20 you print it out, and after that you erase it, and at this point Ruby effectively shifts down all elements in the array. Now the enumerator which is backed by the array picks up the next element which is 40 because everything got shifted down (30 was copied over 20, 40 was copied over 30 and array was resized)

take a look at: https://github.com/ruby/ruby/blob/ca6b174078fa15f33655be704d9409fdbc4f9929/array.c#L3023 It's where the magic of moving the elements via a memmove happens.

4 Comments

This is all very interesting because you need an understanding of what's happening under the hood to know what the performance implications are. An algorithm that does a lot of deletes on a large array in a nested loop is going to have a big performance multiplier (likely O(log(n)) using this method. I am hoping something like array.keep_if, which can process the whole thing at once, would be implemented more efficiently, but hmmm... probably not? Other than looking at the interpreter source code, is there a book or website which discusses this dyad of implementation and performance?
Ruby is not the fastest thing out there. The power/beauty of Ruby comes from its expressiveness, not speed. I have rarely seen ruby code that deletes elements in an array. Usually there is a variation on transformations on data structures, these transformations being chained in a a functional style. If you need/want you persist the final result of this transformations - most times it's faster/cheaper to do so than to alter the original structure.
in general, as a rule of the thumb (and this applies to every language out there) you should not alter a collection while you're iterating over it - this is one of the reasons why you're not allowed to do this. other languages are more strict about it (raising errors/exceptions when this happens), other languages are more relaxed with the caveat of unexpected results like this.
once I understood what was happening, it was no problem building an algorithm that uses delete_at. But it's highly inefficient, because it shifts the whole array each time it deletes an element. Using keep_if creates a new array, doing one copy of elements which are not deleted. So the delete_at is an O(n) method; keep_if is an O(log(n)) method. For a huge string (for fun I concatenated all the words in a dictionary) the difference was #real 0m2.865s (keep_if) versus #real 26m7.780s (delete_at). I'm back to where is a good per function description of what happens.
1

Let's step through it:

arr = [10, 20, 30, 40]  
enum0 = arr.each
  #=> #<Enumerator: [10, 20, 30, 40]:each> 
enum1 = enum0.with_index
  #=> #<Enumerator: #<Enumerator: [10, 20, 30, 40]:each>:with_index> 

We can see the contents of enum1 by converting it to an array:

enum1.to_a
  #=> [[10, 0], [20, 1], [30, 2], [40, 3]] 

This tells us that Enumerator#each (which will invoke Array#each) will pass the four elements of the enumerator enum1 into the block, assigning them in turn to the block variables. The first is:

elmt, i = enum1.next
  #=> [10, 0] 
puts elmt, i
  # 10
  #  0
elmt == 20
  #=> false 

so arr.delete_at(i) is not executed.

Neither arr nor enum1 have been altered:

arr
  #=> [10, 20, 30, 40]
enum1.to_a
  #=> [[10, 0], [20, 1], [30, 2], [40, 3]] 

each now passes the next element of enum1 into the block:

elmt, i = enum1.next
  #=> [20, 1] 
elmt == 20
  #=> true 

so we execute:

arr.delete_at(i)
  #=> [10, 20, 30, 40].delete_at(1)
  #=> 20 
arr
  #=> [10, 30, 40] 
enum1.to_a
  #=> [[10, 0], [30, 1], [40, 2]]

Ah! So the enumerator has been changed as well as arr. That makes perfect sense, because when the enumerator is created a reference to the original receiver is established, together with rules for what is to be be done with it. Changes to the receiver will therefore affect the enumerator.

We can use Enumerator#peek to see what will be the next element of enum1 that each will pass into the block:

enum1.peek
  #=> [40, 2]

So you see that each moves on to the next indexed position, oblivious to the fact that an earlier element has been removed, causing the later elements to each shift down by one position, causing each to skip [30,1].

elmt, i = enum1.next
  #=> [40, 2] 
elmt == 20
  #=> false 
arr
  #=> [10, 30, 40] 
enum1.to_a
  #=> [[10, 0], [30, 1], [40, 2]] 

At this point each reaches the end of the enumerator, so it's job is finished. It therefore returns the original receiver, arr, but that has been modified, so we get:

[10, 30, 40] 

A better example might be:

arr = [10, 20, 20, 40]  

where:

[10, 20, 40] 

would be returned.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.