23

I was looking for an Array equivalent String#split in Ruby Core, and was surprised to find that it did not exist. Is there a more elegant way than the following to split an array into sub-arrays based on a value?

class Array
  def split( split_on=nil )
    inject([[]]) do |a,v|
      a.tap{
        if block_given? ? yield(v) : v==split_on
          a << []
        else
          a.last << v
        end
      }
    end.tap{ |a| a.pop if a.last.empty? }
  end
end

p (1..9 ).to_a.split{ |i| i%3==0 },
  (1..10).to_a.split{ |i| i%3==0 }
#=> [[1, 2], [4, 5], [7, 8]]
#=> [[1, 2], [4, 5], [7, 8], [10]]

Edit: For those interested, the "real-world" problem which sparked this request can be seen in this answer, where I've used @fd's answer below for the implementation.

10
  • Well, in Python you could convert it into a string (values separated by commas or something), split that, and then go back to a list. Dunno if that's an option in Ruby. Commented Jan 26, 2011 at 0:24
  • @Rafe It would be, but only if the contents were only strings. Even then, that could hardly be considered elegant. :p Commented Jan 26, 2011 at 0:38
  • @Phrogz if they were numbers it'd work fine too. You'd just do ','.join([str(x) for x in list_of_nums]), then split on whatever, then rejoin and split on commas. Functional, yes, elegant, eh no. Commented Jan 26, 2011 at 0:45
  • 1
    @Rafe Perhaps I should also accept answers for most roundabout hack. To/from YAML, anyone? :) Commented Jan 26, 2011 at 1:22
  • 2
    FYI: I don't see anything in your solution that requires self to be an Array. You could pull that method up into Enumerable, since you only depend on self responding to inject. (Incidentally, that also would allow you to get rid of the to_a in your two testcases.) Commented Jan 26, 2011 at 11:16

5 Answers 5

20

Sometimes partition is a good way to do things like that:

(1..6).partition { |v| v.even? } 
#=> [[2, 4, 6], [1, 3, 5]]
Sign up to request clarification or add additional context in comments.

1 Comment

Irrelevant to the question: the author wants to split delimited running sequences.
14

I tried golfing it a bit, still not a single method though:

(1..9).chunk{|i|i%3==0}.reject{|sep,ans| sep}.map{|sep,ans| ans}

Or faster:

(1..9).chunk{|i|i%3==0 || nil}.map{|sep,ans| sep&&ans}.compact

Also, Enumerable#chunk seems to be Ruby 1.9+, but it is very close to what you want.

For example, the raw output would be:

(1..9).chunk{ |i|i%3==0 }.to_a                                       
=> [[false, [1, 2]], [true, [3]], [false, [4, 5]], [true, [6]], [false, [7, 8]], [true, [9]]]

(The to_a is to make irb print something nice, since chunk gives you an enumerator rather than an Array)


Edit: Note that the above elegant solutions are 2-3x slower than the fastest implementation:

module Enumerable
  def split_by
    result = [a=[]]
    each{ |o| yield(o) ? (result << a=[]) : (a << o) }
    result.pop if a.empty?
    result
  end
end

6 Comments

Nice! I hadn't seen chunk before. For the record, it's 1.9.2+, but that's wholly acceptable to me.
Not surprisingly (due to the extra iterations needed for reject/map) chunk is a good bit slower; I've added a benchmarking 'answer' collecting implementations.
(1..10).chunk{|n| n % 3 == 0 ? :_separator : :keep}.map{|_,v| v}
(1..10).chuck{|n| n%3==0 || nil}.map{|_,v| v}
|
5

Here are benchmarks aggregating the answers (I'll not be accepting this answer):

require 'benchmark'
a = *(1..5000); N = 1000
Benchmark.bmbm do |x|
  %w[ split_with_inject split_with_inject_no_tap split_with_each
      split_with_chunk split_with_chunk2 split_with_chunk3 ].each do |method|
    x.report( method ){ N.times{ a.send(method){ |i| i%3==0 || i%5==0 } } }
  end
end
#=>                                user     system      total        real
#=> split_with_inject          1.857000   0.015000   1.872000 (  1.879188)
#=> split_with_inject_no_tap   1.357000   0.000000   1.357000 (  1.353135)
#=> split_with_each            1.123000   0.000000   1.123000 (  1.123113)
#=> split_with_chunk           3.962000   0.000000   3.962000 (  3.984398)
#=> split_with_chunk2          3.682000   0.000000   3.682000 (  3.687369)
#=> split_with_chunk3          2.278000   0.000000   2.278000 (  2.281228)

The implementations being tested (on Ruby 1.9.2):

class Array
  def split_with_inject
    inject([[]]) do |a,v|
      a.tap{ yield(v) ? (a << []) : (a.last << v) }
    end.tap{ |a| a.pop if a.last.empty? }
  end

  def split_with_inject_no_tap
    result = inject([[]]) do |a,v|
      yield(v) ? (a << []) : (a.last << v)
      a
    end
    result.pop if result.last.empty?
    result
  end

  def split_with_each
    result = [a=[]]
    each{ |o| yield(o) ? (result << a=[]) : (a << o) }
    result.pop if a.empty?
    result
  end

  def split_with_chunk
    chunk{ |o| !!yield(o) }.reject{ |b,a| b }.map{ |b,a| a }
  end

  def split_with_chunk2
    chunk{ |o| !!yield(o) }.map{ |b,a| b ? nil : a }.compact
  end

  def split_with_chunk3
    chunk{ |o| yield(o) || nil }.map{ |b,a| b && a }.compact
  end
end

1 Comment

A bit late too the party, but: these methods aren't entirely comparable, because the results of these methods aren't all the same. The first three return something similar to what String#split returns (including empty arrays when two subsequent separators are found), while split_with_chunk and split_with_chunk2 never return empty arrays and while split_with_chunk3 still contains the 'grouping' value of chunk.
1

Other Enumerable methods you might want to consider is each_slice or each_cons

I don't know how general you want it to be, here's one way

>> (1..9).each_slice(3) {|a| p a.size>1?a[0..-2]:a}
[1, 2]
[4, 5]
[7, 8]
=> nil
>> (1..10).each_slice(3) {|a| p a.size>1?a[0..-2]:a}
[1, 2]
[4, 5]
[7, 8]
[10]

1 Comment

Only for my specific mod 3 example, but not in general.
1

here is another one (with a benchmark comparing it to the fastest split_with_each here https://stackoverflow.com/a/4801483/410102):

require 'benchmark'

class Array
  def split_with_each
    result = [a=[]]
    each{ |o| yield(o) ? (result << a=[]) : (a << o) }
    result.pop if a.empty?
    result
  end

  def split_with_each_2
    u, v = [], []
    each{ |x| (yield x) ? (u << x) : (v << x) }
    [u, v]
  end
end

a = *(1..5000); N = 1000
Benchmark.bmbm do |x|
  %w[ split_with_each split_with_each_2 ].each do |method|
    x.report( method ){ N.times{ a.send(method){ |i| i%3==0 || i%5==0 } } }
  end
end

                        user     system      total        real
split_with_each     2.730000   0.000000   2.730000 (  2.742135)
split_with_each_2   2.270000   0.040000   2.310000 (  2.309600)

1 Comment

This is like Array#partition, not String#split.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.