Why doesn't this perl regex work

Question

@matches = ( $filestr =~ /^[0-9]+\. (.+\n)*/mg );

I have a file that's been read into filestr, yet for some reason the above regex, which should match the beginning of a line, followed by a number, a dot, a space, and then any number of lines followed by a newline (thus ending when there is a line with only a newline on it), seems to just produce some single lines from the file.

When I do something like

@matches = ( $filestr =~ /^[0-9]+\. .+\n/mg );

I correctly match a single line.

When I do this

@matches = ( $filestr =~ /^[0-9]+\. .+\n.+\n/mg );

I match the same single lines, followed by some seemingly unrelated line. What's wrong with my regex?

Note: The regex works fine in this regex tester:https://regex101.com/, it just won't work in perl.

Example, in this text:

1. This should
match

2. This should too

3. This
one
also

the regex should match

1. This should
match

and

2. This should too

and

3. This
one
also

Just FYI: when line breaks come into play, consider using \R instead of \n. However, here you'd better change the whole approach and read line by line checking each subsequent one. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 29, 2016 at 9:30
Thanks for the suggestion. I just tried \R but I get the same result as with \n. — Will Bolden
– Will Bolden, Commented Nov 29, 2016 at 9:32
Do you know of a good way to check line by line the way you suggested? It seems like I would essentially be manually splitting apart the regex. First checking if a line matched ^[0-9]+\. , then checking if a line matched .+\n for the rest of the first line and all subsequent lines (until I got a line with only a single newline on it, at which point I would have to restart). — Will Bolden
– Will Bolden, Commented Nov 29, 2016 at 9:35
I can only suggest a regex fix like /^[0-9]+\..*?(?:\R{2}|\z)/gsm — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Nov 29, 2016 at 9:48

Arunesh Singh · Accepted Answer · 2016-11-29 10:12:55Z

Your regex is right. But, you are capturing the result partially. I would suggest you to capture the whole match into a single result-set and that's how it is going to be stored into @matches.

So, the correct regex would become /(^[0-9]+\. (?:.+\n)*)/gm. In this way you are capturing the matched result into $1. Wrapping it up into a program yields.

Although, it is going to work without keeping those parenthesis(...) also because by default it takes $&(i.e whole match) unless you capture anything. So, remember in these cases you should use non-capturing group(?: ... ) instead of capturing group().

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $str = '
1. This should
match

2. This should too

3. This
one
also
';

my @matches = $str =~ /^([0-9]+\. (?:.+\n)*)/gm;

print Dumper(\@matches);

Output:

[
          '1. This should
match
',
          '2. This should too
',
          '3. This
one
also
'
        ];

Casimir et Hippolyte · Accepted Answer · 2016-11-29 10:02:27Z

1

In this situation, instead of reading the file by line, you should read it by paragraph. To do that you need to set $/ to the empty string. example:

use strict;
use warnings;

my @result;

{
    local $/ = "";
    while (<DATA>) {
        chomp;
        push @result, $_ ;
        # or to filter paragraphs that don't start with a digit, use instead:
        # push @result, $_ if /^[0-9]+\./; 
    }
}


__DATA__
1. This should
match

2. This should too

3. This
one
also

answered Nov 29, 2016 at 10:02

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Collectives™ on Stack Overflow

Why doesn't this perl regex work

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related