Apply a regex to every element of an array

Question

I have a array in Perl which has values like this :

$Array[0] = "[a][b][c] good bad";
$Array[1] = "[d] apple";
$Array[2] = "[e][f] mango ";
$Array[3] = "[g] capgemini";

I need a regular exp which finds all the text between [].
I have written this :

my @matched = grep {$_ ne ""} map { m/\[(.*?)\]/; $1; } @Array;

However this finds the first match only, like a from $Array[0], e from $Array[2].
I want to get all of them like a,b,c from $Array[0].

Do you actually have a hash reference in each array element, or did you put some kind of quotation around the curly brackets { } ? — TLP
– TLP, Commented Apr 9, 2013 at 8:52
I used {} so as not to confuse with []. It is a array only and each line within " " is its element. — Somnath Paul
– Somnath Paul, Commented Apr 9, 2013 at 9:02
From now on please post your example data as either valid Perl code, or as a well recognized data format. You could use the output of Data::Dumper, Data::Printer, JSON, YAML. — Brad Gilbert
– Brad Gilbert, Commented Apr 9, 2013 at 16:52

Sinan Ünür · Accepted Answer · 2013-04-09 14:17:13Z

4

Your usage of anonymous hashes and omission of sigils is confusing. This works for me, though:

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

my @Array;
$Array[0]= "[a][b][c] good bad";
$Array[1]= "[d] apple";
$Array[2]= "[e][f] mango ";
$Array[3]= "[g] capgemini";
my @matched = map { m/\[(.*?)\]/g } @Array;
print Dumper \@matched;

The main trick is to use the /g option for global matching and letting the matching return all the matches.

edited Apr 9, 2013 at 14:17

Sinan Ünür

118k15 gold badges201 silver badges347 bronze badges

answered Apr 9, 2013 at 8:54

choroba

245k27 gold badges221 silver badges304 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Somnath Paul Over a year ago

We can have some like [], which is empty. But in our final array we dont want empty elements, so we can use the grep too I think.

TLP Over a year ago

@SomnathPaul Use .+? instead to avoid capturing empty elements. Or avoid using non-greedy capture and use [^]]+

halfer · Accepted Answer · 2025-07-16 22:32:00Z

-2

This situation is a really big gotcha for regular expressions. When doing an m//g global match, m//g will NOT proceed until the end of the string by itself. This is intended behavior. An m//g match will only match the first occurrence, return true, and will not continue searching until the end of the string.

If you want an m//g global match to continue to the end of the string, you have to put it in a while loop like so...

while( m/\[(.*?)\]/g ){ print "$1\n"; }

The way this works is an m//g match will return true until it no longer matches. After it no longer matches it will return false and the loop will break. Behind the scenes Perl keeps a pos value for each string. After a match, the pos value is updated to the position directly after the match. The next iteration of the while loop will begin searching from this position. After a failed match, the pos value will reset to 0.

Here is the code illustrating this process, and showing how the pos value is working behind the scenes...

#!/usr/bin/perl -w

my @strings = ("[a][b][c] good bad","[d] apple","[e][f] mango ","[g] capgemini", 
               "[h] then text [i]", "text first [j][k][l]", 
               "[more][than][one][letter]","[more than one word]");
for(@strings){
  my $i = 1;
  my $p = 0;
  print "$_:\n";
  while( /\[(.*?)\]/g ){
    print "\titer: $i\tpos: $p\ttext: \"$1\"\n";
    $p = pos; #pos value changes after each m//g global match
              #the next m//g match on this string will always start from this position
    $i++;
  }
  print "\n";
}

Output looks like this...

$ perl global.match.pl
[a][b][c] good bad:
    iter: 1 pos: 0  text: "a"
    iter: 2 pos: 3  text: "b"
    iter: 3 pos: 6  text: "c"

[d] apple:
    iter: 1 pos: 0  text: "d"

[e][f] mango :
    iter: 1 pos: 0  text: "e"
    iter: 2 pos: 3  text: "f"

[g] capgemini:
    iter: 1 pos: 0  text: "g"

[h] then text [i]:
    iter: 1 pos: 0  text: "h"
    iter: 2 pos: 3  text: "i"

text first [j][k][l]:
    iter: 1 pos: 0  text: "j"
    iter: 2 pos: 14 text: "k"
    iter: 3 pos: 17 text: "l"

[more][than][one][letter]:
    iter: 1 pos: 0  text: "more"
    iter: 2 pos: 6  text: "than"
    iter: 3 pos: 12 text: "one"
    iter: 4 pos: 17 text: "letter"

[more than one word]:
    iter: 1 pos: 0  text: "more than one word"

This is a frustrating bug to find because most people are unaware of pos and what it does. The way this works means every m//g search has side effects. Basically the pos value is changing behind the scenes and will produce unexpected behavior if you are unaware of how this works. If the pos value has changed, and you want to reset the value to the beginning of the string, you would have to use the rather strange looking syntax...

pos($string) = 0;

This situation is not exactly intuitive, but using the above while loop syntax will probably get the results you were intending. Note that an s///g global match WILL automatically proceed to the end of the string. So m//g and s///g behave slightly differently, which adds to the confusion.

Here is some documentation about the pos variable from perldoc -f pos

$ perldoc -f pos

pos     Returns the offset of where the last "m//g" search left off for
        the variable in question ($_ is used when the variable is not
        specified). This offset is in characters unless the
        (no-longer-recommended) "use bytes" pragma is in effect, in
        which case the offset is in bytes. Note that 0 is a valid match
        offset. "undef" indicates that the search position is reset
        (usually due to match failure, but can also be because no match
        has yet been run on the scalar).

        "pos" directly accesses the location used by the regexp engine
        to store the offset, so assigning to "pos" will change that
        offset <cut>

That should clear up the confusion some. If you work with regular expressions long enough, you will eventually run into this problem.

edited Jul 16 at 22:32

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Mar 14 at 12:03

user3408541

1

14 Comments

brian d foy Mar 15 at 5:03

There's no such thing as converting one context into another. But also note, you don't need the pos if you use \G.

user3408541 Mar 15 at 5:52

$s =~ m//g returns a true/false scalar by default, however you can "convert" or change it to list context where it will return a list of all matches and proceed to the end of the string which it normally would not do. So yes, you can convert or change one context into another. This is what the other answer suggested. However it returns a single list of matches for all strings, with no way to tell which match is from which string. This is probably not what most people will want. The while( m//g ){} syntax is probably what people will want most of the time.

brian d foy Mar 16 at 5:26

Sorry but no. We don't know what $s =~ m//g returns until we know the context. You don't "convert" anything. There's no "normally". You don't start out with some default and change it to another. And, there are plenty of ways to tell which matches come from which string; that answer simply didn't care to distinguish that,.

user3408541 Mar 16 at 8:35

I have detailed what happens above in both list and scalar context. You can easily convert one context to another. If you do no conversion, Perl will by default return scalar context, a true/false value. To change this to a list context, you would have to assign this expression to a list like so @list = $s =~ m//g. By default, Perl will assume it is a scalar. The context will be immediately obvious on first glance. Basically is the l-value a scalar or a list. If it is alone, like you have written above $s =~ m//g this is the default scalar context.

brian d foy Mar 16 at 11:23

Nope, there is no default context. There is only the context it is, and in some cases you have to know. I've detailed it in my book Learning Perl :)

|

Collectives™ on Stack Overflow

Apply a regex to every element of an array

2 Answers 2

2 Comments

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related