Regex match multiple pattern

Question

Below is my test string:

Object: TLE-234DSDSDS324-234SDF324ER
  Page location: SDEWRSD3242SD-234/324/234 (1)
    org-chart           Lorem ipsum dolor    consectetur adipiscing          # Colorado
    234DSDSDS324-32-4/2/7-page2 (2) loc log  Apr 18 21:42:49 2017           1
      Page information: 3.32.232.212.23, Error: fatal, Technique: color
        Comments: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. 
      Validation status: Lorem ipsums dolors sits amets, consectetur adipiscing elit
       Positive control-export: Validated
  Page location: SDEWRSD3242SD-SDF/234/324 (5)
    org-chart           Lorem ipsum dolor    consectetur adipiscin          # Arizona
    234DSDSDS324-23-11/1/0-page1 (1) loc log Apr 18 21:42:49 2017           1
      Page information: 3.32.232.212.23, Error: log, Technique: color
        Comments: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
      Validation status: Lorem ipsums dolors sits amets, consectetur adipiscing elit
       Positive control-export: Validated

I need to capture strings after the "Page location: ", "Object: " and "Comments: "

For example:

Object: TLE-234DSDSDS324-234SDF324ER - Group 1

Page location: SDEWRSD3242SD-234/324/234 (1) - Group 2

Page location: SDEWRSD3242SD-SDF/234/324 (5) - Group 3

Comments: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. - Group 4

Comments: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. - Group 5

Here is my regex URL.

I am able to capture the strings but the regex won't capture if any one of the string is repeated.

You're having problems if ie Page location occurs multiple times, is this right? — Jan
– Jan, Commented May 9, 2017 at 21:59
Exactly, but it is not matching if i add one more Page location and Comments in the test string? — Raja
– Raja, Commented May 9, 2017 at 22:10
Is all this in one string (or is it in separate lines) -- or, how do you get this data into the program? Is "Page location:" unique, so that you always need what follows it? How far after "Page location" do you need to capture -- to the frst newline? This is all shown "inside" of one "Object" -- are there multiple such sections in your string/file? — zdim
– zdim, Commented May 9, 2017 at 22:36

zdim · Accepted Answer · 2017-05-11 06:02:57Z

1

(See comments below the question for the problem description.)

The data is in a multi-line string, with multiple sections starting with Object:. Within each there are multiple lines starting with phrases Page location: and Comments:. The rest of the line for all these need be captured, and all organized by Objects.

Instead of attempting a tortured multi-line "single" regex, break the string into lines and process section by section. This way the problem becomes a very simple one.

The results are stored in an array of hashrefs; each has for keys the shown phrases. Since they can appear more than once per section their values are arrayrefs (with what follows them on the line).

use warnings;
use strict;
use feature 'say';

my $input_string = '...'; 
my @lines = split /\n/, $input_string;

my $patt = qr/Object|Page location|Comments/;

my @sections;
for (@lines) 
{
    next if not /^\s*($patt):\s*(.*)/;

    push @sections, {}  if $1 eq 'Object';

    push @{ $sections[-1]->{$1} }, $2;
}

foreach my $sec (@sections) {
    foreach my $key (sort keys %$sec) {
        say "$key:";
        say "\t$_" for @{$sec->{$key}};
    }   
}

With the input string copied (suppressed above for brevity), the output is

Comments:
        Lorem ipsum dolor sit amet,  [...] 
        Lorem ipsum dolor sit amet,  [...]
Page location:
        SDEWRSD3242SD-234/324/234 (1)
        SDEWRSD3242SD-SDF/234/324 (5)
Object:
        TLE-234DSDSDS324-234SDF324ER

A few comments.

Once the Object line is found we add a new hashref to @sections. Then the match for a pattern is set as a key and the rest of its line added to its arrayref value. This is done for the current (so last) element of @sections.

This adds an empty string if a pattern had nothing following. To disallow add next if not $2;

Note. An easy and common way to print complex data structures is via the core module Data::Dumper. But also see Data::Dump for a much more compact printout.

edited May 11, 2017 at 6:02

answered May 9, 2017 at 23:08

zdim

67.2k5 gold badges59 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Joe McMahon Over a year ago

I'm guessing here, but does OP need to restart his collection of data for each new object? In that case I'd use an array of hashes; start with an empty hash, capture into it until a new 'Object:' is seen, at which point a new empty anonymous hash is pushed onto the array and data is captured into that. Totally agree that a loop is a far better solution here than trying to compress the logic into a single regex!

zdim Over a year ago

@JoeMcMahon Yes, this needs to be adjusted if multiple Object sections exist and need be distinguished. I asked a question, in comments and in the answer itself, and am waiting :). The approach with one regex assumes that all data is available in a string, which isn't clear. My question was answered that it's "separate lines" and then reading a file is common ... I am waiting on that clarification as well.

Collectives™ on Stack Overflow

Regex match multiple pattern

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related