3

I need help splitting the following string into (Date, ID, msecs)

May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec

I only want the first part of the ID before the first underscore.

So this is what I want the output to look like

May 26 09:33:33, 0191070818, 180

I am having trouble figuring out what to put in the regex

use strict;
use warnings;

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my @values = split('/[]/', $data);

foreach my $val (@values) {
  print "$val\n";
}

exit 0;

6 Answers 6

4

OK. That split just isn't going to work - because you've used single quotes, the string is used literally. As it doesn't occur in your sample text, it doesn't do anything at all.

Split 'cuts up' a string based on a field separator, which probably isn't what you want. E.g.

 split ( ' ', $data ); 

Will give you:

$VAR1 = [
          'May',
          '26',
          '09:33:33',
          'localhost',
          'archiver:',
          'saving',
          'ID',
          '0091070818_1432647213_489715',
          'took',
          '180',
          'msec'
        ];

Given your string doesn't really 'fieldify' like that properly, I'd suggest a different approach:

You need to select the things you want out of it. Assuming you're not getting some somewhat odd records mixed in:

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my ($time_str) = ( $data =~ m/^(\w+ \d+ \d{2}:\d{2}:\d{2})/ );
my ($id)       = ( $data =~ m/(\d+)_/ );
my ($msec)     = ( $data =~ m/(\d+) msec/ );
print "$time_str, $id, $msec,\n";

Note - you can combine your regex patterns (as some of the examples indicate). I've done it this way hopefully to simplify and clarify what's happening. The regular expression match is applied to $data (because of =~). The 'matching' elements in brackets () are then extracted and 'returned' to be inserted into the variable on the lefthand side.

(Note - you need to have the 'my ( $msec)' in brackets, because that way the value is used, rather than the result of the test (true/false))

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Now if I was reading multiple lines from a text file that were similar to that line would I do something like this? while(<IN>){ if(/saving ID/){ my ($time_str) = ( m/^(\w+ \d+ \d{2}:\d{2}:\d{2})/ );
Yes, pretty much. Although I'd suggest using open ( my $input, "<", $filename ) or die $! and then using while ( <$input> ) { instead. (It's more or less the same, but better style)
4

It might even be simplest to just split the data on whitespace (and then reconstruct the date by joining together the first three fields). It's not very sophisticated, but it gets the job done.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my @values = split(/\s+/, $data);

my $date = join ' ', @values[0,1,2];
my $id   = $values[7];
my $time = $values[9];

say "Date: $date";
say "ID:   $id";
say "Time: $time";

Which gives:

Date: May 26 09:33:33
ID:   0091070818_1432647213_489715
Time: 180

Comments

3

split doesn't look like the correct tool for the job. I'd use a regex match:

my @values = $data =~ /^([[:alpha:]]{3}\s[0-9][0-9]\s[0-9][0-9]:[0-9][0-9]:[0-9][0-9]) # date & time
                       \s.*?\sID\s
                       ([0-9]+)            # ID
                       .*\stook\s
                       ([0-9]+)            # duration
                       \smsec/x;
print join(',', @values), "\n";

2 Comments

Using \d with the /a modifier is a nice alternative to [0-9]
@Borodin: In 5.14+, yes.
2

It's probably best to do this with three separate patterns. The code below demonstrates

I've used the /x modifier so that I can put spaces in the regex patterns for improved readability

Unless you are certain that your data will be well-formed (i.e. it is the output of a program) you should add tests to make sure that all three values are defined after the pattern match. Or you can directly test the pattern match itself

use strict;
use warnings;
use v5.10;

my $s = 'May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec';

for ( $s ) {

    my ($date)  = / ^ ( [a-z]+ \s+ \d+ \s+ [\d:]+ ) /ix;
    my ($id)    = / ID \s+ (\d+) _ /x;
    my ($msecs) = / (\d+) \s+ msec /x;

    say join ',', $date, $id, $msecs;
}

output

May 26 09:33:33,0191070818,180

Comments

2

I don't know that split() is the best approach. This code matches your target ID and extracts it:

($id) = $data =~ m/(?<=ID )[^_]+/g;

The regex uses a look-behind (?<=ID ) to anchor the start of the match just to the right of "ID ", then grabs everything not an underscore that follows.


Here's some test code:

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
($id) = $data =~ m/(?<=ID )[^_]+/g;
print $id

Output:

0091070818

See live demo.

Comments

1

split is not the tool to use here. Here is a regex that works at least for your specific case you listed.

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

$data =~ m/^(\w+ \d+ \d\d:\d\d:\d\d).+saving ID (\d+).+took (\d+) msec$/;

my ($date, $id, $msec) = ($1,$2,$3);

print "$date, $id, $msec\n";

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.