Splitting a string using regex in Perl

Question

I need help splitting the following string into (Date, ID, msecs)

May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec

I only want the first part of the ID before the first underscore.

So this is what I want the output to look like

May 26 09:33:33, 0191070818, 180

I am having trouble figuring out what to put in the regex

use strict;
use warnings;

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my @values = split('/[]/', $data);

foreach my $val (@values) {
  print "$val\n";
}

exit 0;

Sobrique · Accepted Answer · 2015-07-20 15:18:28Z

4

OK. That split just isn't going to work - because you've used single quotes, the string is used literally. As it doesn't occur in your sample text, it doesn't do anything at all.

Split 'cuts up' a string based on a field separator, which probably isn't what you want. E.g.

 split ( ' ', $data );

Will give you:

$VAR1 = [
          'May',
          '26',
          '09:33:33',
          'localhost',
          'archiver:',
          'saving',
          'ID',
          '0091070818_1432647213_489715',
          'took',
          '180',
          'msec'
        ];

Given your string doesn't really 'fieldify' like that properly, I'd suggest a different approach:

You need to select the things you want out of it. Assuming you're not getting some somewhat odd records mixed in:

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my ($time_str) = ( $data =~ m/^(\w+ \d+ \d{2}:\d{2}:\d{2})/ );
my ($id)       = ( $data =~ m/(\d+)_/ );
my ($msec)     = ( $data =~ m/(\d+) msec/ );
print "$time_str, $id, $msec,\n";

Note - you can combine your regex patterns (as some of the examples indicate). I've done it this way hopefully to simplify and clarify what's happening. The regular expression match is applied to $data (because of =~). The 'matching' elements in brackets () are then extracted and 'returned' to be inserted into the variable on the lefthand side.

(Note - you need to have the 'my ( $msec)' in brackets, because that way the value is used, rather than the result of the test (true/false))

answered Jul 20, 2015 at 15:18

Sobrique

53.6k8 gold badges63 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2007843 Over a year ago

Thank you! Now if I was reading multiple lines from a text file that were similar to that line would I do something like this? while(<IN>){ if(/saving ID/){ my ($time_str) = ( m/^(\w+ \d+ \d{2}:\d{2}:\d{2})/ );

Sobrique Over a year ago

Yes, pretty much. Although I'd suggest using open ( my $input, "<", $filename ) or die $! and then using while ( <$input> ) { instead. (It's more or less the same, but better style)

Dave Cross · Accepted Answer · 2015-07-20 15:28:09Z

4

It might even be simplest to just split the data on whitespace (and then reconstruct the date by joining together the first three fields). It's not very sophisticated, but it gets the job done.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my @values = split(/\s+/, $data);

my $date = join ' ', @values[0,1,2];
my $id   = $values[7];
my $time = $values[9];

say "Date: $date";
say "ID:   $id";
say "Time: $time";

Which gives:

Date: May 26 09:33:33
ID:   0091070818_1432647213_489715
Time: 180

answered Jul 20, 2015 at 15:28

Dave Cross

69.5k3 gold badges55 silver badges101 bronze badges

Comments

choroba · Accepted Answer · 2015-07-27 16:04:54Z

3

split doesn't look like the correct tool for the job. I'd use a regex match:

my @values = $data =~ /^([[:alpha:]]{3}\s[0-9][0-9]\s[0-9][0-9]:[0-9][0-9]:[0-9][0-9]) # date & time
                       \s.*?\sID\s
                       ([0-9]+)            # ID
                       .*\stook\s
                       ([0-9]+)            # duration
                       \smsec/x;
print join(',', @values), "\n";

edited Jul 27, 2015 at 16:04

answered Jul 20, 2015 at 15:17

choroba

245k27 gold badges221 silver badges304 bronze badges

2 Comments

Borodin Over a year ago

Using \d with the /a modifier is a nice alternative to [0-9]

choroba Over a year ago

@Borodin: In 5.14+, yes.

Borodin · Accepted Answer · 2015-07-20 15:20:21Z

It's probably best to do this with three separate patterns. The code below demonstrates

I've used the /x modifier so that I can put spaces in the regex patterns for improved readability

Unless you are certain that your data will be well-formed (i.e. it is the output of a program) you should add tests to make sure that all three values are defined after the pattern match. Or you can directly test the pattern match itself

use strict;
use warnings;
use v5.10;

my $s = 'May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec';

for ( $s ) {

    my ($date)  = / ^ ( [a-z]+ \s+ \d+ \s+ [\d:]+ ) /ix;
    my ($id)    = / ID \s+ (\d+) _ /x;
    my ($msecs) = / (\d+) \s+ msec /x;

    say join ',', $date, $id, $msecs;
}

output

May 26 09:33:33,0191070818,180

Bohemian · Accepted Answer · 2015-07-20 15:20:40Z

2

I don't know that split() is the best approach. This code matches your target ID and extracts it:

($id) = $data =~ m/(?<=ID )[^_]+/g;

The regex uses a look-behind (?<=ID ) to anchor the start of the match just to the right of "ID ", then grabs everything not an underscore that follows.

Here's some test code:

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
($id) = $data =~ m/(?<=ID )[^_]+/g;
print $id

Output:

0091070818

See live demo.

edited Jul 20, 2015 at 15:20

answered Jul 20, 2015 at 15:13

Bohemian♦

427k103 gold badges603 silver badges750 bronze badges

Comments

Andy Lester · Accepted Answer · 2015-07-20 15:20:04Z

1

split is not the tool to use here. Here is a regex that works at least for your specific case you listed.

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

$data =~ m/^(\w+ \d+ \d\d:\d\d:\d\d).+saving ID (\d+).+took (\d+) msec$/;

my ($date, $id, $msec) = ($1,$2,$3);

print "$date, $id, $msec\n";

answered Jul 20, 2015 at 15:20

Andy Lester

94.2k16 gold badges106 silver badges162 bronze badges

Collectives™ on Stack Overflow

Splitting a string using regex in Perl

6 Answers 6

2 Comments

Comments

2 Comments

output

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

Comments

2 Comments

output

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related