19

I have a fixed-sized array where the size of the array is always in factor of 3.

my @array = ('foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5);

How can I cluster the member of array such that we can get an array of array group by 3:

$VAR = [ ['foo','bar','qux'],
         ['foo1','bar','qux2'],
         [3, 4, 5] ];
2
  • 3
    Watch out, all the splice based options below are destructive to your array. You will need to work on a copy if you want to preserve your original array. Commented Sep 29, 2009 at 23:38
  • 2
    This is a very important note re: splice. Addedum: natatime is implemented using splice as well, so is subject to the above note. Commented Sep 30, 2009 at 12:27

9 Answers 9

38
my @VAR;
push @VAR, [ splice @array, 0, 3 ] while @array;

or you could use natatime from List::MoreUtils

use List::MoreUtils qw(natatime);

my @VAR;
{
  my $iter = natatime 3, @array;
  while( my @tmp = $iter->() ){
    push @VAR, \@tmp;
  }
}
Sign up to request clarification or add additional context in comments.

4 Comments

@Brad - +1 for List::MoreUtils - it's a great gem all-around even outside of this answer.
Also, please note that - at least as of 2/2009 - there was a memory leak in XS version of natatime (no leak in PP version). See perlmonks.org/?node_id=742364
In Perl 6 you could write it as @array.rotor(3,0)
The design for .rotor in Perl 6 has changed, you would just write it as @array.rotor(3). If you need to get the last few elements when the list doesn't split evenly you would add a :partial to the method call.
7

I really like List::MoreUtils and use it frequently. However, I have never liked the natatime function. It doesn't produce output that can be used with a for loop or map or grep.

I like to chain map/grep/apply operations in my code. Once you understand how these functions work, they can be very expressive and very powerful.

But it is easy to make a function to work like natatime that returns a list of array refs.

sub group_by ($@) {
    my $n     = shift;
    my @array = @_;

    croak "group_by count argument must be a non-zero positive integer"
        unless $n > 0 and int($n) == $n;

    my @groups;
    push @groups, [ splice @array, 0, $n ] while @array;

    return @groups;
}

Now you can do things like this:

my @grouped = map [ reverse @$_ ],
              group_by 3, @array;

** Update re Chris Lutz's suggestions **

Chris, I can see merit in your suggested addition of a code ref to the interface. That way a map-like behavior is built in.

# equivalent to my map/group_by above
group_by { [ reverse @_ ] } 3, @array;

This is nice and concise. But to keep the nice {} code ref semantics, we have put the count argument 3 in a hard to see spot.

I think I like things better as I wrote it originally.

A chained map isn't that much more verbose than what we get with the extended API. With the original approach a grep or other similar function can be used without having to reimplement it.

For example, if the code ref is added to the API, then you have to do:

my @result = group_by { $_[0] =~ /foo/ ? [@_] : () } 3, @array;

to get the equivalent of:

my @result = grep $_->[0] =~ /foo/,
             group_by 3, @array;

Since I suggested this for the sake of easy chaining, I like the original better.

Of course, it would be easy to allow either form:

sub _copy_to_ref { [ @_ ] }

sub group_by ($@) {
    my $code = \&_copy_to_ref;
    my $n = shift;

    if( reftype $n eq 'CODE' ) {
        $code = $n;
        $n = shift;
    }

    my @array = @_;

    croak "group_by count argument must be a non-zero positive integer"
        unless $n > 0 and int($n) == $n;

    my @groups;
    push @groups, $code->(splice @array, 0, $n) while @array;

    return @groups;
}

Now either form should work (untested). I'm not sure whether I like the original API, or this one with the built in map capabilities better.

Thoughts anyone?

** Updated again **

Chris is correct to point out that the optional code ref version would force users to do:

group_by sub { foo }, 3, @array;

Which is not so nice, and violates expectations. Since there is no way to have a flexible prototype (that I know of), that puts the kibosh on the extended API, and I'd stick with the original.

On a side note, I started with an anonymous sub in the alternate API, but I changed it to a named sub because I was subtly bothered by how the code looked. No real good reason, just an intuitive reaction. I don't know if it matters either way.

4 Comments

Why not have group_by take a code reference as the first argument, so we can determine what to do with our group? Usage: group_by { [ @_ ] } 3, @array;
Ideal syntax would be group_by 3 { [ @_ ] } @array; but of course we'd need to explicitly declare the anonymous sub for Perl not to whine.
The only problem with the second version that uses an optional code reference is that the map { code } @list syntax only works if the subroutine is prototyped to have the first argument be a code reference. As written, you would need to explicitly specify that the code block was a sub (or declare the sub somewhere else and pass a reference to it). Also, I wouldn't have bothered writing a named subroutine for _copy_to_ref() and just said my $code = sub { [ @_ ] }; but that's just me. It might be more efficient to do it your way.
I agree about natatime offering a very limited (and a decidedly un-ModernPerl) API. No chaining, no easy iteration with counting, etc.
5

Or this:

my $VAR;
while( my @list = splice( @array, 0, 3 ) ) {
    push @$VAR, \@list;
}

Comments

5

Another answer (a variation on Tore's, using splice but avoiding the while loop in favor of more Perl-y map)

my $result = [ map { [splice(@array, 0, 3)] } (1 .. (scalar(@array) + 2) % 3) ];

4 Comments

I wouldn't call it more Perl-y just because it uses map() - it's largely much more cluttered and harder to grok. The most "Perl-y" solution is natatime() because it's from CPAN.
Hmm... I can't say I greatly disagree with you re: possibly harder to grok. But having been a professional Perl developer for many years, I have encountered enough bad-to-horrible junk on CPAN that I don't necessarily consider "uses something from CPAN" to be a Good Householding Seal Of Approval of a perl solution. Mind you, List::MoreUtils, from my cursory examination today, appears to be a very neat and useful module, so it is definitely not included in the gripe above :)
@DVK - When I say "because it's from CPAN," I'm lovingly poking fun at the trends of my favorite language, not offering it as the be-all end-all of solutions. We really need to find a way to express sarcasm on the internets.
Sorry. After 2 sleepless nights, my sarcasm module is not loading.
4

Another generic solution, non-destructive to the original array:

use Data::Dumper;

sub partition {
    my ($arr, $N) = @_; 

    my @res;
    my $i = 0;

    while ($i + $N-1 <= $#$arr) {
        push @res, [@$arr[$i .. $i+$N-1]];
        $i += $N; 
    }   

    if ($i <= $#$arr) {
        push @res, [@$arr[$i .. $#$arr]];
    }   
    return \@res;
}

print Dumper partition(
    ['foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5], 
    3   
);

The output:

$VAR1 = [
          [
            'foo',
            'bar',
            'qux'
          ],
          [
            'foo1',
            'bar',
            'qux2'
          ],
          [
            3,
            4,
            5
          ]
        ];

Comments

3

Try this:

$VAR = [map $_ % 3 == 0 ? ([ $array[$_], $array[$_ + 1], $array[$_ + 2] ]) 
                        : (),
            0..$#array];

2 Comments

I'm not sure whether +1 it for cuteness or -1 it for sheer hackiness :) Unvoted it stays.
-1, because I would 100% make a mistake somewhere in there :)
3

As a learning experience I decided to do this in Perl6

The first, perhaps most simplest way I tried was to use map.

my @output := @array.map: -> $a, $b?, $c? { [ $a, $b // Nil, $c // Nil ] };
.say for @output;
foo bar qux
foo1 bar qux2
3 4 5

That didn't seem very scalable. What if I wanted to take the items from the list 10 at a time, that would get very annoying to write. ... Hmmm I did just mention "take" and there is a keyword named take lets try that in a subroutine to make it more generally useful.

sub at-a-time ( Iterable \sequence, Int $n where $_ > 0 = 1 ){
  my $is-lazy = sequence.is-lazy;
  my \iterator = sequence.iterator;

  # gather is used with take
  gather loop {
    my Mu @current;
    my \result = iterator.push-exactly(@current,$n);

    # put it into the sequence, and yield
    take @current.List;

    last if result =:= IterationEnd;
  }.lazy-if($is-lazy)
}

For kicks let's try it against an infinite list of the fibonacci sequence

my $fib = (1, 1, *+* ... *);
my @output = at-a-time( $fib, 3 );
.say for @output[^5]; # just print out the first 5
(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)

Notice that I used $fib instead of @fib. It was to prevent Perl6 from caching the elements of the Fibonacci sequence.
It might be a good idea to put it into a subroutine to create a new sequence everytime you need one, so that the values can get garbage collected when you are done with them.
I also used .is-lazy and .lazy-if to mark the output sequence lazy if the input sequence is. Since it was going into an array @output it would have tried to generate all of the elements from an infinite list before continuing onto the next line.


Wait a minute, I just remembered .rotor.

my @output = $fib.rotor(3);

.say for @output[^5]; # just print out the first 5
(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)

.rotor is actually far more powerful than I've demonstrated.

If you want it to return a partial match at the end you will need to add a :partial to the arguments of .rotor.

Comments

3

Use the spart function from the List::NSect package on CPAN.

    perl -e '
    use List::NSect qw{spart};
    use Data::Dumper qw{Dumper};
    my @array = ("foo", "bar", "qux", "foo1", "bar", "qux2", 3, 4, 5);
    my $var = spart(3, @array);
    print Dumper $var;
    '

    $VAR1 = [
          [
            'foo',
            'bar',
            'qux'
          ],
          [
            'foo1',
            'bar',
            'qux2'
          ],
          [
            3,
            4,
            5
          ]
        ];

Comments

1

Below a more generic solution to the problem:

my @array = ('foo', 'bar', 1, 2);
my $n = 3;
my @VAR = map { [] } 1..$n;
my @idx = sort map { $_ % $n } 0..$#array;

for my $i ( 0..$#array ){
        push @VAR[ $idx[ $i ] ], @array[ $i ];
}

This also works when the number of items in the array is not a factor of 3. In the above example, the other solutions with e.g. splice would produce two arrays of length 2 and one of length 0.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.