This situation is a really big gotcha for regular expressions. When doing an m//g global match, m//g will NOT proceed until the end of the string by itself. This is intended behavior. An m//g match will only match the first occurrence, return true, and will not continue searching until the end of the string.
If you want an m//g global match to continue to the end of the string, you have to put it in a while loop like so...
while( m/\[(.*?)\]/g ){ print "$1\n"; }
The way this works is an m//g match will return true until it no longer matches. After it no longer matches it will return false and the loop will break. Behind the scenes Perl keeps a pos value for each string. After a match, the pos value is updated to the position directly after the match. The next iteration of the while loop will begin searching from this position. After a failed match, the pos value will reset to 0.
Here is the code illustrating this process, and showing how the pos value is working behind the scenes...
#!/usr/bin/perl -w
my @strings = ("[a][b][c] good bad","[d] apple","[e][f] mango ","[g] capgemini",
"[h] then text [i]", "text first [j][k][l]",
"[more][than][one][letter]","[more than one word]");
for(@strings){
my $i = 1;
my $p = 0;
print "$_:\n";
while( /\[(.*?)\]/g ){
print "\titer: $i\tpos: $p\ttext: \"$1\"\n";
$p = pos; #pos value changes after each m//g global match
#the next m//g match on this string will always start from this position
$i++;
}
print "\n";
}
Output looks like this...
$ perl global.match.pl
[a][b][c] good bad:
iter: 1 pos: 0 text: "a"
iter: 2 pos: 3 text: "b"
iter: 3 pos: 6 text: "c"
[d] apple:
iter: 1 pos: 0 text: "d"
[e][f] mango :
iter: 1 pos: 0 text: "e"
iter: 2 pos: 3 text: "f"
[g] capgemini:
iter: 1 pos: 0 text: "g"
[h] then text [i]:
iter: 1 pos: 0 text: "h"
iter: 2 pos: 3 text: "i"
text first [j][k][l]:
iter: 1 pos: 0 text: "j"
iter: 2 pos: 14 text: "k"
iter: 3 pos: 17 text: "l"
[more][than][one][letter]:
iter: 1 pos: 0 text: "more"
iter: 2 pos: 6 text: "than"
iter: 3 pos: 12 text: "one"
iter: 4 pos: 17 text: "letter"
[more than one word]:
iter: 1 pos: 0 text: "more than one word"
This is a frustrating bug to find because most people are unaware of pos and what it does. The way this works means every m//g search has side effects. Basically the pos value is changing behind the scenes and will produce unexpected behavior if you are unaware of how this works. If the pos value has changed, and you want to reset the value to the beginning of the string, you would have to use the rather strange looking syntax...
pos($string) = 0;
This situation is not exactly intuitive, but using the above while loop syntax will probably get the results you were intending. Note that an s///g global match WILL automatically proceed to the end of the string. So m//g and s///g behave slightly differently, which adds to the confusion.
Here is some documentation about the pos variable from perldoc -f pos
$ perldoc -f pos
pos Returns the offset of where the last "m//g" search left off for
the variable in question ($_ is used when the variable is not
specified). This offset is in characters unless the
(no-longer-recommended) "use bytes" pragma is in effect, in
which case the offset is in bytes. Note that 0 is a valid match
offset. "undef" indicates that the search position is reset
(usually due to match failure, but can also be because no match
has yet been run on the scalar).
"pos" directly accesses the location used by the regexp engine
to store the offset, so assigning to "pos" will change that
offset <cut>
That should clear up the confusion some. If you work with regular expressions long enough, you will eventually run into this problem.
{ }?