1
@matches = ( $filestr =~ /^[0-9]+\. (.+\n)*/mg );

I have a file that's been read into filestr, yet for some reason the above regex, which should match the beginning of a line, followed by a number, a dot, a space, and then any number of lines followed by a newline (thus ending when there is a line with only a newline on it), seems to just produce some single lines from the file.

When I do something like

@matches = ( $filestr =~ /^[0-9]+\. .+\n/mg );

I correctly match a single line.

When I do this

@matches = ( $filestr =~ /^[0-9]+\. .+\n.+\n/mg );

I match the same single lines, followed by some seemingly unrelated line. What's wrong with my regex?

Note: The regex works fine in this regex tester:https://regex101.com/, it just won't work in perl.

Example, in this text:

1. This should
match

2. This should too

3. This
one
also

the regex should match

1. This should
match

and

2. This should too

and

3. This
one
also
11
  • Just FYI: when line breaks come into play, consider using \R instead of \n. However, here you'd better change the whole approach and read line by line checking each subsequent one. Commented Nov 29, 2016 at 9:30
  • Thanks for the suggestion. I just tried \R but I get the same result as with \n. Commented Nov 29, 2016 at 9:32
  • Do you know of a good way to check line by line the way you suggested? It seems like I would essentially be manually splitting apart the regex. First checking if a line matched ^[0-9]+\. , then checking if a line matched .+\n for the rest of the first line and all subsequent lines (until I got a line with only a single newline on it, at which point I would have to restart). Commented Nov 29, 2016 at 9:35
  • 3
    could you post the sample lines for matching the regex Commented Nov 29, 2016 at 9:41
  • I can only suggest a regex fix like /^[0-9]+\..*?(?:\R{2}|\z)/gsm Commented Nov 29, 2016 at 9:48

2 Answers 2

2

Your regex is right. But, you are capturing the result partially. I would suggest you to capture the whole match into a single result-set and that's how it is going to be stored into @matches.

So, the correct regex would become /(^[0-9]+\. (?:.+\n)*)/gm. In this way you are capturing the matched result into $1. Wrapping it up into a program yields.

Although, it is going to work without keeping those parenthesis(...) also because by default it takes $&(i.e whole match) unless you capture anything. So, remember in these cases you should use non-capturing group(?: ... ) instead of capturing group().

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $str = '
1. This should
match

2. This should too

3. This
one
also
';

my @matches = $str =~ /^([0-9]+\. (?:.+\n)*)/gm;

print Dumper(\@matches);

Output:

[
          '1. This should
match
',
          '2. This should too
',
          '3. This
one
also
'
        ];
Sign up to request clarification or add additional context in comments.

Comments

1

In this situation, instead of reading the file by line, you should read it by paragraph. To do that you need to set $/ to the empty string. example:

use strict;
use warnings;

my @result;

{
    local $/ = "";
    while (<DATA>) {
        chomp;
        push @result, $_ ;
        # or to filter paragraphs that don't start with a digit, use instead:
        # push @result, $_ if /^[0-9]+\./; 
    }
}


__DATA__
1. This should
match

2. This should too

3. This
one
also

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.