3

I have a array in Perl which has values like this :

$Array[0] = "[a][b][c] good bad";
$Array[1] = "[d] apple";
$Array[2] = "[e][f] mango ";
$Array[3] = "[g] capgemini";

I need a regular exp which finds all the text between [].
I have written this :

my @matched = grep {$_ ne ""} map { m/\[(.*?)\]/; $1; } @Array;

However this finds the first match only, like a from $Array[0], e from $Array[2].
I want to get all of them like a,b,c from $Array[0].

3
  • Do you actually have a hash reference in each array element, or did you put some kind of quotation around the curly brackets { } ? Commented Apr 9, 2013 at 8:52
  • I used {} so as not to confuse with []. It is a array only and each line within " " is its element. Commented Apr 9, 2013 at 9:02
  • From now on please post your example data as either valid Perl code, or as a well recognized data format. You could use the output of Data::Dumper, Data::Printer, JSON, YAML. Commented Apr 9, 2013 at 16:52

2 Answers 2

4

Your usage of anonymous hashes and omission of sigils is confusing. This works for me, though:

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

my @Array;
$Array[0]= "[a][b][c] good bad";
$Array[1]= "[d] apple";
$Array[2]= "[e][f] mango ";
$Array[3]= "[g] capgemini";
my @matched = map { m/\[(.*?)\]/g } @Array;
print Dumper \@matched;

The main trick is to use the /g option for global matching and letting the matching return all the matches.

Sign up to request clarification or add additional context in comments.

2 Comments

We can have some like [], which is empty. But in our final array we dont want empty elements, so we can use the grep too I think.
@SomnathPaul Use .+? instead to avoid capturing empty elements. Or avoid using non-greedy capture and use [^]]+
-2

This situation is a really big gotcha for regular expressions. When doing an m//g global match, m//g will NOT proceed until the end of the string by itself. This is intended behavior. An m//g match will only match the first occurrence, return true, and will not continue searching until the end of the string.

If you want an m//g global match to continue to the end of the string, you have to put it in a while loop like so...

while( m/\[(.*?)\]/g ){ print "$1\n"; }

The way this works is an m//g match will return true until it no longer matches. After it no longer matches it will return false and the loop will break. Behind the scenes Perl keeps a pos value for each string. After a match, the pos value is updated to the position directly after the match. The next iteration of the while loop will begin searching from this position. After a failed match, the pos value will reset to 0.

Here is the code illustrating this process, and showing how the pos value is working behind the scenes...

#!/usr/bin/perl -w

my @strings = ("[a][b][c] good bad","[d] apple","[e][f] mango ","[g] capgemini", 
               "[h] then text [i]", "text first [j][k][l]", 
               "[more][than][one][letter]","[more than one word]");
for(@strings){
  my $i = 1;
  my $p = 0;
  print "$_:\n";
  while( /\[(.*?)\]/g ){
    print "\titer: $i\tpos: $p\ttext: \"$1\"\n";
    $p = pos; #pos value changes after each m//g global match
              #the next m//g match on this string will always start from this position
    $i++;
  }
  print "\n";
}

Output looks like this...

$ perl global.match.pl
[a][b][c] good bad:
    iter: 1 pos: 0  text: "a"
    iter: 2 pos: 3  text: "b"
    iter: 3 pos: 6  text: "c"

[d] apple:
    iter: 1 pos: 0  text: "d"

[e][f] mango :
    iter: 1 pos: 0  text: "e"
    iter: 2 pos: 3  text: "f"

[g] capgemini:
    iter: 1 pos: 0  text: "g"

[h] then text [i]:
    iter: 1 pos: 0  text: "h"
    iter: 2 pos: 3  text: "i"

text first [j][k][l]:
    iter: 1 pos: 0  text: "j"
    iter: 2 pos: 14 text: "k"
    iter: 3 pos: 17 text: "l"

[more][than][one][letter]:
    iter: 1 pos: 0  text: "more"
    iter: 2 pos: 6  text: "than"
    iter: 3 pos: 12 text: "one"
    iter: 4 pos: 17 text: "letter"

[more than one word]:
    iter: 1 pos: 0  text: "more than one word"

This is a frustrating bug to find because most people are unaware of pos and what it does. The way this works means every m//g search has side effects. Basically the pos value is changing behind the scenes and will produce unexpected behavior if you are unaware of how this works. If the pos value has changed, and you want to reset the value to the beginning of the string, you would have to use the rather strange looking syntax...

pos($string) = 0;

This situation is not exactly intuitive, but using the above while loop syntax will probably get the results you were intending. Note that an s///g global match WILL automatically proceed to the end of the string. So m//g and s///g behave slightly differently, which adds to the confusion.

Here is some documentation about the pos variable from perldoc -f pos

$ perldoc -f pos

pos     Returns the offset of where the last "m//g" search left off for
        the variable in question ($_ is used when the variable is not
        specified). This offset is in characters unless the
        (no-longer-recommended) "use bytes" pragma is in effect, in
        which case the offset is in bytes. Note that 0 is a valid match
        offset. "undef" indicates that the search position is reset
        (usually due to match failure, but can also be because no match
        has yet been run on the scalar).

        "pos" directly accesses the location used by the regexp engine
        to store the offset, so assigning to "pos" will change that
        offset <cut>

That should clear up the confusion some. If you work with regular expressions long enough, you will eventually run into this problem.

14 Comments

There's no such thing as converting one context into another. But also note, you don't need the pos if you use \G.
$s =~ m//g returns a true/false scalar by default, however you can "convert" or change it to list context where it will return a list of all matches and proceed to the end of the string which it normally would not do. So yes, you can convert or change one context into another. This is what the other answer suggested. However it returns a single list of matches for all strings, with no way to tell which match is from which string. This is probably not what most people will want. The while( m//g ){} syntax is probably what people will want most of the time.
Sorry but no. We don't know what $s =~ m//g returns until we know the context. You don't "convert" anything. There's no "normally". You don't start out with some default and change it to another. And, there are plenty of ways to tell which matches come from which string; that answer simply didn't care to distinguish that,.
I have detailed what happens above in both list and scalar context. You can easily convert one context to another. If you do no conversion, Perl will by default return scalar context, a true/false value. To change this to a list context, you would have to assign this expression to a list like so @list = $s =~ m//g. By default, Perl will assume it is a scalar. The context will be immediately obvious on first glance. Basically is the l-value a scalar or a list. If it is alone, like you have written above $s =~ m//g this is the default scalar context.
Nope, there is no default context. There is only the context it is, and in some cases you have to know. I've detailed it in my book Learning Perl :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.