5

After having read this similar question and having tried my code several times, I keep on getting the same undesired output.

Let's assume the string I'm searching is "I saw wilma yesterday". The regex should capture each word followed by an 'a' and its optional 5 following characters or spaces.

The code I wrote is the following:

$_ = "I saw wilma yesterday";

if (@m = /(\w+)a(.{5,})?/g){
    print "found " . @m . " matches\n";

    foreach(@m){
        print "\t\"$_\"\n";
    }
}

However, I kept on getting the following output:

found 2 matches
    "s"
    "w wilma yesterday"

while I expected to get the following one:

found 3 matches:
    "saw wil"
    "wilma yest"
    "yesterday"

until I found out that the return values inside @m were $1 and $2, as you can notice.

Now, since the /g flag is on, and I don't think the problem is about the regex, how could I get the desired output?

3
  • I don't understand why you have day in your expected result and not yesterday? Commented Jul 10, 2013 at 21:09
  • Well, you're right. Just edited. Commented Jul 10, 2013 at 21:10
  • Ok, i have a pattern for you. Commented Jul 10, 2013 at 21:11

3 Answers 3

3

You can try this pattern that allows overlapped results:

(?=\b(\w+a.{1,5}))

or

(?=(?i)\b([a-z]+a.{0,5}))

example:

use strict;
my $str = "I saw wilma yesterday";
my @matches = ($str =~ /(?=\b([a-z]+a.{0,5}))/gi);
print join("\n", @matches),"\n";

more explanations:

You can't have overlapped results with a regex since when a character is "eaten" by the regex engine it can't be eaten a second time. The trick to avoid this constraint, is to use a lookahead (that is a tool that only checks, but not matches) which can run through the string several times, and put a capturing group inside.

For another example of this behaviour, you can try the example code without the word boundary (\b) to see the result.

Sign up to request clarification or add additional context in comments.

4 Comments

I tested this and it worked. Well done. while ( m/(?=\b(\w+a.{1,5}))/g ) { print "$1\n"; }
Yeah, It works properly. But wasn't the "?=" the lookhahead option? Why is it necessary?
@none: the lookahead allows overlapped matches, then you can have overlapped results ( as saw wil and wilma yest) in one shot. see the example.
Yeah, I noticed it. Thanks anyway!
1

Firstly you want to capture everything inside the expression, i.e.:

/(\w+a(?:.{5,})?)/

Next you want to start your search from one character past where the last expression's first character matched.

The pos() function allows you to specify where a /g regex starts its search from.

Comments

1
$s = "I saw wilma yesterday";    
while ($s =~ /(\w+a(.{0,5}))/g){
    print "\t\"$1\"\n";
    pos($s) = pos($s) - length($2); 
}

Gives you:

"saw wil"
"wilma yest"
"yesterday"

But I don't know why you should get day and not yesterday.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.