2

I have a huge file which has data and iam trying to search a string in each line. and just save that search part and not entire line in array.

Here is the code i have tried

use warnings;
use Data::Dumper;

my $start_run = time();

    while (<DATA>){
        my $line=$_  ;
        if ($line =~ m/Date/) {
           my @result = grep (/Date/, $line);
           print @result;
       }
     }


#####
my $end_run = time();
my $run_time = sprintf "%.2f", (($end_run - $start_run) / 60);
print "Elapsed: $run_time minutes\n";


__DATA__
ServerName: (DESCRIPTION=(CONNECT_TIMEOUT=60)(RETRY_COUNT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=xbian.dbaas.ing.net)(PORT=121))(CONNECT_DATA=(SERVER=DEDI)(SERVICE_NAME=pmx0))) ServerType: Oracle DatabaseName: MX_FN_OWNER RDBMSAccess: NATIVE_OCI ConnectionName: Mx0_MUXFO_1_1 ConnectionNo: 1  Date: 2020-03-29 08:58:10
insert into MX_FN_OWNER.TRN_EDBF (TIMESTAMP,M_IDENTITY,M_REFERENCE,M_USER,M_GROUP,M_DESK,M_DATE_SYS,M_DATE_CMP,M_TIME_CMP,M_SDATE_CMP,M_STIME_CMP,M_COMMENT,M_ERROR,M_START_END,M_TIME_CPU,M_TIME_SYB,M_TIME_ELAP,M_SCRPT_NAME,M_UNIT_NAME,M_ERR_COUNT,M_NPID) values (0,TRN_EODA_DBFS.nextval,:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19) (Bulk_Copy begin, 19 columns, 1 Flush size)

                              ==============================================
ServerName: (DESCRIPTION=(CONNECT_TIMEOUT=60)(RETRY_COUNT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=xb305-scan.net)(PORT=121))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=pmx02fn))) ServerType: Oracle DatabaseName: MX_FN_OWNER RDBMSAccess: NATIVE_OCI ConnectionName: Mx0_MXFO_168991_1 ConnectionNo: 1  Date: 2020-03-29 09:21:10
Mux execution time: 00:00:00   3 ms 

Apparently each line has Date and i am just interested in Date and the time it has sp that i can substract the time between 2 lines and save it . But when i am trying to grep the output is entire line . i could not split the line as there is no delimiter .

Is there a way i can just get the Date : 2020-03-29 09:21:10 associated with each line

Conversion Script

#!/usr/bin/perl

use strict;
use warnings;

use DateTime::Format::Strptime;

my $parser = DateTime::Format::Strptime->new(
  pattern => 'd{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2}',
  on_error => 'croak',
);

my $dt = $parser->parse_datetime('2020-03-29 08:58:10');

print "$dt\n";

Thanks

3 Answers 3

2

You could match a date like pattern and use \K to reset the match buffer.

Note that the pattern does not validate the date time itself.

Then add the whole match using $& to an array.

\bDate:\h+\K\d{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2}$

Explanation

  • \bDate:\h+\K Match Date: and 1+ horizontal whitespace chars.
  • \K Reset match buffer
  • \d{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2} Match a date time like pattern
  • $ If the value is always at the end, you can assert the end of the string

Regex demo | Perl demo

For example:

my @arr;
while (<DATA>){
    my $line=$_  ;
    if ($line =~ m/\bDate:\h+\K\d{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2}$/) {
        push(@arr, $&);
    }
}

for my $i (0 .. $#arr) {
    if (exists($arr[$i + 1])) {
        my $currentDateTime = Time::Piece->strptime(
            $arr[$i],
            "%Y-%m-%d %H:%M:%S");
        my $nextDateTime = Time::Piece->strptime(
            $arr[$i + 1],
            "%Y-%m-%d %H:%M:%S");

        my $diff = $nextDateTime - $currentDateTime;
        print($diff->minutes);
    }        
}

Output

23 minutes

You could narrow down the date pattern using ranges (It still does not validate it)

\bDate:\h+\K\d{4}-(?:1[0-2]|0?[1-9])-(?:3[01]|[12][0-9]|0?[1-9])\h+(?:2[0-3]|[01]?[0-9]):[0-5]?[0-9]:[0-5]?[0-9]$

Regex demo

Sign up to request clarification or add additional context in comments.

10 Comments

wow, thats works, thanks . Assuming there are 1 million lines and each line has dates and i would want to calculate elapsed time between 2 lines . Saving all dates in array and substracting time would work, is that the correct approach?
ex 09:21 - 08:58 = 23 minutes from the above 2 lines, like wise i would want to calculate elapsed time between all the lines one by one
You could for example loop the array and parse the format to a DateTime for the current entry and the next entry. Then perform you calculations.
hi , i have tried the below, but it doesnt work , may be i have messed up the pattern. i have updated the code with new conversion code. please let me know
oh , i did not notice that you have updated the code..thanks a ton
|
0

Try this

use strict;
use warnings;
use DateTime::Format::Strptime;
my $start_time=time();

my @arr;
my $parser = DateTime::Format::Strptime->new(
  pattern => '%Y-%m-%d %H:%M:%S',
  on_error => 'croak',
);
while (<DATA>)
{
   my $line=$_;
   if($line =~ m/\bDate:\h+\K\d{4}-\d{2}-\d{2}\h+\d{2}:\d{2}:\d{2}$/) {
      push (@arr,$&);
   }
}

for my $i (0 .. $#arr) {
    if (exists($arr[$i + 1])) {
        my $currentDateTime =  $parser->parse_datetime ($arr[$i]);
        my $nextDateTime =  $parser->parse_datetime ($arr[$i + 1]);
        my $diff = $nextDateTime - $currentDateTime;
        print($diff->hours,":",$diff->minutes );
        print "\n";
    }
}

Comments

0

Desired result can be achieved with following algorithm:

Extract data by looking through data for pattern \bDate: \S+ \S+$ into an array.

Pass data array to subroutine which computes differences between elements and return reference to result array (elements of array represent hash with keys hours,minutes).

Output result array content

use strict;
use warnings;
use feature 'say';

use DateTime::Format::Strptime;

my @data;
my $re = qr/\bDate: (\S+ \S+)$/;

/$re/ && push @data,$1 for <DATA>;

say "$_->{hours}:$_->{minutes}" for @{ time_diff(\@data) };

sub time_diff {
    my $data = shift;
    my @result;

    my $parser = DateTime::Format::Strptime->new(
      pattern => '%Y-%m-%d %H:%M:%S',
      on_error => 'croak',
    );

    for (1..$#data) {
        my $begin = $parser->parse_datetime ( $data->[$_-1] );
        my $end   = $parser->parse_datetime ( $data->[$_]   );
        my $diff  = $end-$begin;

        push @result, { hours => $diff->hours, minutes => $diff->minutes };
    }

    return \@result;
}

__DATA__
ServerName: (DESCRIPTION=(CONNECT_TIMEOUT=60)(RETRY_COUNT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=xbian.dbaas.ing.net)(PORT=121))(CONNECT_DATA=(SERVER=DEDI)(SERVICE_NAME=pmx0))) ServerType: Oracle DatabaseName: MX_FN_OWNER RDBMSAccess: NATIVE_OCI ConnectionName: Mx0_MUXFO_1_1 ConnectionNo: 1  Date: 2020-03-29 08:58:10
insert into MX_FN_OWNER.TRN_EDBF (TIMESTAMP,M_IDENTITY,M_REFERENCE,M_USER,M_GROUP,M_DESK,M_DATE_SYS,M_DATE_CMP,M_TIME_CMP,M_SDATE_CMP,M_STIME_CMP,M_COMMENT,M_ERROR,M_START_END,M_TIME_CPU,M_TIME_SYB,M_TIME_ELAP,M_SCRPT_NAME,M_UNIT_NAME,M_ERR_COUNT,M_NPID) values (0,TRN_EODA_DBFS.nextval,:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19) (Bulk_Copy begin, 19 columns, 1 Flush size)

                              ==============================================
ServerName: (DESCRIPTION=(CONNECT_TIMEOUT=60)(RETRY_COUNT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=xb305-scan.net)(PORT=121))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=pmx02fn))) ServerType: Oracle DatabaseName: MX_FN_OWNER RDBMSAccess: NATIVE_OCI ConnectionName: Mx0_MXFO_168991_1 ConnectionNo: 1  Date: 2020-03-29 09:21:10
Mux execution time: 00:00:00   3 ms 

ServerName: (DESCRIPTION=(CONNECT_TIMEOUT=60)(RETRY_COUNT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=xb305-scan.net)(PORT=121))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=pmx02fn))) ServerType: Oracle DatabaseName: MX_FN_OWNER RDBMSAccess: NATIVE_OCI ConnectionName: Mx0_MXFO_168991_1 ConnectionNo: 1  Date: 2020-03-30 07:11:05
Mux execution time: 00:00:00   3 ms

ServerName: (DESCRIPTION=(CONNECT_TIMEOUT=60)(RETRY_COUNT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=xb305-scan.net)(PORT=121))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=pmx02fn))) ServerType: Oracle DatabaseName: MX_FN_OWNER RDBMSAccess: NATIVE_OCI ConnectionName: Mx0_MXFO_168991_1 ConnectionNo: 1  Date: 2020-03-30 21:49:42
Mux execution time: 00:00:00   3 ms

Output

0:23
21:49
14:38

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.