0

I'm trying to formulate a regular expression to use on text. Using in-memory variables is not giving the same result.

The below regular expression provides $1 and $2 that return what I expect. rw results vary. These positions can vary: I am looking to extract the data irrespective of the position in the string.

\/vol\/(\w+)\?(\w+|\s+).*rw=(.*\w+)

My data:

_DATA_
/vol/vol1   -sec=sys,rw=h1:h2,anon=0
/vol/vol1/q1 -sec=sys,rw=h3:h4,anon=0,ro=h1:h2
/vol/vol2/q1  -sec=sys,root=host5,ro=h3:h5,rw=h1:h2,anon=0

I'm trying to capture the second and third groups (if it is a space it should return a space), and a list of entries in rw, ro and root.

1
  • 1
    There's no ? in any of your strings, so \? won't match. Commented Feb 4, 2016 at 0:17

3 Answers 3

1

The expression (.*\w+) will match up to the last word character in the line. What you are looking for is most likely this ([0-9a-z:]+)

Sign up to request clarification or add additional context in comments.

1 Comment

This works to get the values..now trying the split approach as these values could be in different positions. Thxs
1

Guessing from your comment in reply to ikegami, maybe the following will give results you want.

#!/usr/bin/perl
use strict;
use warnings;

my @keys = qw/ rw ro root /;
my $wanted = join "|", @keys;

my %data;

while (<DATA>) {
    my ($path, $param) = split;
    my ($vol, $q) = (split '/', $path)[2,3];

    my %tmp = map {split /=/} grep /^(?:$wanted)/, split /,/, $param;

    $data{$vol}{$q // ' '} = \%tmp;
}

use Data::Dumper; print Dumper \%data;

__DATA__
/vol/vol1   -sec=sys,rw=h1:h2,anon=0
/vol/vol1/q1 -sec=sys,rw=h3:h4,anon=0,ro=h1:h2
/vol/vol2/q1  -sec=sys,root=host5,ro=h3:h5,rw=h1:h2,anon=0

The output from Data::Dumper is:

$VAR1 = {
          'vol2' => {
                      'q1' => {
                                'ro' => 'h3:h5',
                                'root' => 'host5',
                                'rw' => 'h1:h2'
                              }
                    },
          'vol1' => {
                      ' ' => {
                               'rw' => 'h1:h2'
                             },
                      'q1' => {
                                'ro' => 'h1:h2',
                                'rw' => 'h3:h4'
                              }
                    }
        };

Update: can you tell me what does (?:) mean in the grep?

(?: . . .) is a non-capturing group. It is used in this case because the beginning of the regex has ^. Without grouping, the regex would attempt to match ro positioned at the beginning of the string or rw or root anywhere in the string (not just the beginning).

/^ro|rw|root/ rather than /^(?:ro|rw|root)/

The second expression helps the search along because it knows to only attempt a match at the beginning of the string for all 3 patterns and not to try to match anywhere in the string (speeds things up although in your case, there are only 3 alternating matches to attempt - so, wouldn't make a huge difference here). But, still a good practice to follow.

what does (// ' ') stand for?

That is the defined or operator. The expression $q // ' ' says to use $q for the key in the hash if it is defined or a space instead.

You said in your original post I'm trying to capture the second and third groups (if it is a space it should return a space).

$q can be undefined when the split, my ($vol, $q) = (split '/', $path)[2,3]; has only a vol and not a q such as in this data line (/vol/vol1 -sec=sys,rw=h1:h2,anon=0).

10 Comments

Thank you Chris..this what am looking for.
Hi Chris, getting an error on the search expression..can you tell me what does (?:) mean in the grep? Also, you have dumped the data into another hash with the required vol and q names. what does (// ' ') stand for..
@jsks See my update that answers your questions. getting an error on the search expression That shouldn't be. Did you copy and paste the code - it runs fine here.
For the second question in that case, wouldn;t it be || ' ' instead of two forward slahes(//)?
when used print Dumper \%data inside the while loop it is with the format looking for. Inside While loop with print Dumper \%data; $VAR1 = { 'vol2' => { 'q1' => { 'ro' => 'h3:h5', 'root' => 'host5', 'rw' => 'h1:h2' } } }; Outside While loop with print Dumper \%data; $VAR1 = { 'vol1' => { 'q2' => { 'rw' => 'h1:h2' }, 'q1' => $VAR1->{'vol1'}{'q2'} } };
|
0

No idea what you want, but a regex would not make a good parser here.

while (<DATA>) {
   my ($path, $opts) = split;
   my %opts =
      map { my ($k,$v) = split(/=/, $_, 2); $k=>$v }
         split(/,/, $opts);

   ...
}

(my %opts = split(/[,=]/, $opts); might suffice.)

3 Comments

This is the end result am looking for vol1{q1}{rw => hosts, ro => hosts, root=>hosts} Checking the suggested..
What am i doing wrong here? while(<$fh>){ my($path, $opts)= split; $path=/\/vol\/(\w+)\/?(\w+|\s+)/; print "$1:$2 \n"; #print $path; %opts=map{chomp;s/-sec=sys//g;s/anon=0//g;my($k,$v)=split(/=/,$opts,2);$opts{$1}{$2}{$k}=$v;} print Dumper \%opts; } close $fh;
I don't know what that means

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.