How can I partition a Perl array into equal sized chunks?

Question

I have a fixed-sized array where the size of the array is always in factor of 3.

my @array = ('foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5);

How can I cluster the member of array such that we can get an array of array group by 3:

$VAR = [ ['foo','bar','qux'],
         ['foo1','bar','qux2'],
         [3, 4, 5] ];

Watch out, all the splice based options below are destructive to your array. You will need to work on a copy if you want to preserve your original array. — daotoad
– daotoad, Commented Sep 29, 2009 at 23:38
This is a very important note re: splice. Addedum: natatime is implemented using splice as well, so is subject to the above note. — DVK
– DVK, Commented Sep 30, 2009 at 12:27

Brad Gilbert · Accepted Answer · 2009-09-30 03:21:11Z

38

my @VAR;
push @VAR, [ splice @array, 0, 3 ] while @array;

or you could use natatime from List::MoreUtils

use List::MoreUtils qw(natatime);

my @VAR;
{
  my $iter = natatime 3, @array;
  while( my @tmp = $iter->() ){
    push @VAR, \@tmp;
  }
}

edited Sep 30, 2009 at 3:21

answered Sep 29, 2009 at 14:18

Brad Gilbert

34.2k11 gold badges80 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

DVK Over a year ago

@Brad - +1 for List::MoreUtils - it's a great gem all-around even outside of this answer.

DVK Over a year ago

Also, please note that - at least as of 2/2009 - there was a memory leak in XS version of natatime (no leak in PP version). See perlmonks.org/?node_id=742364

Brad Gilbert Over a year ago

In Perl 6 you could write it as @array.rotor(3,0)

Brad Gilbert Over a year ago

The design for .rotor in Perl 6 has changed, you would just write it as @array.rotor(3). If you need to get the last few elements when the list doesn't split evenly you would add a :partial to the method call.

daotoad · Accepted Answer · 2009-09-30 19:12:43Z

7

I really like List::MoreUtils and use it frequently. However, I have never liked the natatime function. It doesn't produce output that can be used with a for loop or map or grep.

I like to chain map/grep/apply operations in my code. Once you understand how these functions work, they can be very expressive and very powerful.

But it is easy to make a function to work like natatime that returns a list of array refs.

sub group_by ($@) {
    my $n     = shift;
    my @array = @_;

    croak "group_by count argument must be a non-zero positive integer"
        unless $n > 0 and int($n) == $n;

    my @groups;
    push @groups, [ splice @array, 0, $n ] while @array;

    return @groups;
}

Now you can do things like this:

my @grouped = map [ reverse @$_ ],
              group_by 3, @array;

** Update re Chris Lutz's suggestions **

Chris, I can see merit in your suggested addition of a code ref to the interface. That way a map-like behavior is built in.

# equivalent to my map/group_by above
group_by { [ reverse @_ ] } 3, @array;

This is nice and concise. But to keep the nice {} code ref semantics, we have put the count argument 3 in a hard to see spot.

I think I like things better as I wrote it originally.

A chained map isn't that much more verbose than what we get with the extended API. With the original approach a grep or other similar function can be used without having to reimplement it.

For example, if the code ref is added to the API, then you have to do:

my @result = group_by { $_[0] =~ /foo/ ? [@_] : () } 3, @array;

to get the equivalent of:

my @result = grep $_->[0] =~ /foo/,
             group_by 3, @array;

Since I suggested this for the sake of easy chaining, I like the original better.

Of course, it would be easy to allow either form:

sub _copy_to_ref { [ @_ ] }

sub group_by ($@) {
    my $code = \&_copy_to_ref;
    my $n = shift;

    if( reftype $n eq 'CODE' ) {
        $code = $n;
        $n = shift;
    }

    my @array = @_;

    croak "group_by count argument must be a non-zero positive integer"
        unless $n > 0 and int($n) == $n;

    my @groups;
    push @groups, $code->(splice @array, 0, $n) while @array;

    return @groups;
}

Now either form should work (untested). I'm not sure whether I like the original API, or this one with the built in map capabilities better.

Thoughts anyone?

** Updated again **

Chris is correct to point out that the optional code ref version would force users to do:

group_by sub { foo }, 3, @array;

Which is not so nice, and violates expectations. Since there is no way to have a flexible prototype (that I know of), that puts the kibosh on the extended API, and I'd stick with the original.

On a side note, I started with an anonymous sub in the alternate API, but I changed it to a named sub because I was subtly bothered by how the code looked. No real good reason, just an intuitive reaction. I don't know if it matters either way.

edited Sep 30, 2009 at 19:12

answered Sep 30, 2009 at 0:59

daotoad

27.3k7 gold badges63 silver badges101 bronze badges

4 Comments

Chris Lutz Over a year ago

Why not have group_by take a code reference as the first argument, so we can determine what to do with our group? Usage: group_by { [ @_ ] } 3, @array;

Chris Lutz Over a year ago

Ideal syntax would be group_by 3 { [ @_ ] } @array; but of course we'd need to explicitly declare the anonymous sub for Perl not to whine.

Chris Lutz Over a year ago

The only problem with the second version that uses an optional code reference is that the map { code } @list syntax only works if the subroutine is prototyped to have the first argument be a code reference. As written, you would need to explicitly specify that the code block was a sub (or declare the sub somewhere else and pass a reference to it). Also, I wouldn't have bothered writing a named subroutine for _copy_to_ref() and just said my $code = sub { [ @_ ] }; but that's just me. It might be more efficient to do it your way.

Davor Cubranic Over a year ago

I agree about natatime offering a very limited (and a decidedly un-ModernPerl) API. No chaining, no easy iteration with counting, etc.

Tore Busch · Accepted Answer · 2009-09-29 07:47:36Z

5

Or this:

my $VAR;
while( my @list = splice( @array, 0, 3 ) ) {
    push @$VAR, \@list;
}

answered Sep 29, 2009 at 7:47

Tore Busch

511 bronze badge

Comments

DVK · Accepted Answer · 2009-09-29 08:24:39Z

5

Another answer (a variation on Tore's, using splice but avoiding the while loop in favor of more Perl-y map)

my $result = [ map { [splice(@array, 0, 3)] } (1 .. (scalar(@array) + 2) % 3) ];

answered Sep 29, 2009 at 8:24

DVK

130k33 gold badges219 silver badges337 bronze badges

4 Comments

Chris Lutz Over a year ago

I wouldn't call it more Perl-y just because it uses map() - it's largely much more cluttered and harder to grok. The most "Perl-y" solution is natatime() because it's from CPAN.

DVK Over a year ago

Hmm... I can't say I greatly disagree with you re: possibly harder to grok. But having been a professional Perl developer for many years, I have encountered enough bad-to-horrible junk on CPAN that I don't necessarily consider "uses something from CPAN" to be a Good Householding Seal Of Approval of a perl solution. Mind you, List::MoreUtils, from my cursory examination today, appears to be a very neat and useful module, so it is definitely not included in the gripe above :)

Chris Lutz Over a year ago

@DVK - When I say "because it's from CPAN," I'm lovingly poking fun at the trends of my favorite language, not offering it as the be-all end-all of solutions. We really need to find a way to express sarcasm on the internets.

DVK Over a year ago

Sorry. After 2 sleepless nights, my sarcasm module is not loading.

Eugene Yarmash · Accepted Answer · 2014-09-18 16:57:05Z

4

Another generic solution, non-destructive to the original array:

use Data::Dumper;

sub partition {
    my ($arr, $N) = @_; 

    my @res;
    my $i = 0;

    while ($i + $N-1 <= $#$arr) {
        push @res, [@$arr[$i .. $i+$N-1]];
        $i += $N; 
    }   

    if ($i <= $#$arr) {
        push @res, [@$arr[$i .. $#$arr]];
    }   
    return \@res;
}

print Dumper partition(
    ['foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5], 
    3   
);

The output:

$VAR1 = [
          [
            'foo',
            'bar',
            'qux'
          ],
          [
            'foo1',
            'bar',
            'qux2'
          ],
          [
            3,
            4,
            5
          ]
        ];

answered Sep 18, 2014 at 16:57

Eugene Yarmash

152k44 gold badges346 silver badges391 bronze badges

Comments

Adrian Pronk · Accepted Answer · 2009-09-29 06:44:27Z

3

Try this:

$VAR = [map $_ % 3 == 0 ? ([ $array[$_], $array[$_ + 1], $array[$_ + 2] ]) 
                        : (),
            0..$#array];

edited Sep 29, 2009 at 6:44

answered Sep 29, 2009 at 6:38

Adrian Pronk

14k7 gold badges38 silver badges61 bronze badges

2 Comments

DVK Over a year ago

I'm not sure whether +1 it for cuteness or -1 it for sheer hackiness :) Unvoted it stays.

Karel Bílek Over a year ago

-1, because I would 100% make a mistake somewhere in there :)

Community · Accepted Answer · 2020-06-20 09:12:55Z

As a learning experience I decided to do this in Perl6

The first, perhaps most simplest way I tried was to use map.

my @output := @array.map: -> $a, $b?, $c? { [ $a, $b // Nil, $c // Nil ] };
.say for @output;

foo bar qux
foo1 bar qux2
3 4 5

That didn't seem very scalable. What if I wanted to take the items from the list 10 at a time, that would get very annoying to write. ... Hmmm I did just mention "take" and there is a keyword named take lets try that in a subroutine to make it more generally useful.

sub at-a-time ( Iterable \sequence, Int $n where $_ > 0 = 1 ){
  my $is-lazy = sequence.is-lazy;
  my \iterator = sequence.iterator;

  # gather is used with take
  gather loop {
    my Mu @current;
    my \result = iterator.push-exactly(@current,$n);

    # put it into the sequence, and yield
    take @current.List;

    last if result =:= IterationEnd;
  }.lazy-if($is-lazy)
}

For kicks let's try it against an infinite list of the fibonacci sequence

my $fib = (1, 1, *+* ... *);
my @output = at-a-time( $fib, 3 );
.say for @output[^5]; # just print out the first 5

(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)

Notice that I used $fib instead of @fib. It was to prevent Perl6 from caching the elements of the Fibonacci sequence.
It might be a good idea to put it into a subroutine to create a new sequence everytime you need one, so that the values can get garbage collected when you are done with them.
I also used .is-lazy and .lazy-if to mark the output sequence lazy if the input sequence is. Since it was going into an array @output it would have tried to generate all of the elements from an infinite list before continuing onto the next line.

Wait a minute, I just remembered .rotor.

my @output = $fib.rotor(3);

.say for @output[^5]; # just print out the first 5

(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)

.rotor is actually far more powerful than I've demonstrated.

If you want it to return a partial match at the end you will need to add a :partial to the arguments of .rotor.

user2449151 · Accepted Answer · 2020-10-01 15:00:32Z

3

Use the spart function from the List::NSect package on CPAN.

    perl -e '
    use List::NSect qw{spart};
    use Data::Dumper qw{Dumper};
    my @array = ("foo", "bar", "qux", "foo1", "bar", "qux2", 3, 4, 5);
    my $var = spart(3, @array);
    print Dumper $var;
    '

    $VAR1 = [
          [
            'foo',
            'bar',
            'qux'
          ],
          [
            'foo1',
            'bar',
            'qux2'
          ],
          [
            3,
            4,
            5
          ]
        ];

edited Oct 1, 2020 at 15:00

answered May 23, 2014 at 0:53

user2449151

911 silver badge4 bronze badges

Comments

user1981275 · Accepted Answer · 2014-04-24 10:26:19Z

1

Below a more generic solution to the problem:

my @array = ('foo', 'bar', 1, 2);
my $n = 3;
my @VAR = map { [] } 1..$n;
my @idx = sort map { $_ % $n } 0..$#array;

for my $i ( 0..$#array ){
        push @VAR[ $idx[ $i ] ], @array[ $i ];
}

This also works when the number of items in the array is not a factor of 3. In the above example, the other solutions with e.g. splice would produce two arrays of length 2 and one of length 0.

answered Apr 24, 2014 at 10:26

user1981275

13.5k11 gold badges78 silver badges105 bronze badges

Collectives™ on Stack Overflow

How can I partition a Perl array into equal sized chunks?

9 Answers 9

4 Comments

4 Comments

Comments

4 Comments

Comments

2 Comments

As a learning experience I decided to do this in Perl6

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

4 Comments

4 Comments

Comments

4 Comments

Comments

2 Comments

As a learning experience I decided to do this in Perl6

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related