9

I have a string from which I wish to extract a single word, but with a numerical appended to it, which might be different in each line:

This is string1 this is string
This is string11 
This is string6 and it is in this line

I want to parse this file and get the values of "stringXXX", starting from 0 to 100

# suppose ABC.txt contains the above lines
FH1 = open "Abc.txt"; 
@abcFile = <FH1>;

foreach $line(@abcFile) {
    if ($pattern =~ s/string.(d{0}d{100});
        print $pattern;

The above prints the whole line, I wish to get only stringXXX

4 Answers 4

13

you need to capture it:

while ($pattern =~/(string(100|\d{1,2}))/g) {
    print $1;
}

Explanation:

  • the parentheses capture what's in them into $1. If you have more than one set of parens, the 1st captures into $1, the 2nd into $2 etc. In this case $2 will have the actual number.
  • \d{1,2} captures between 1 and 3 digits, allowing you to capture between 0 and 99. The additional 100 there allows you to capture 100 explicitly, since it's the only 3-digit number you want to match.

edit: fixed the order of the numbers that are captured.

Sign up to request clarification or add additional context in comments.

2 Comments

(\d{1,2}) captures between 1 and 2 digits.
Your regexp captures '10' for 'string100'. It should be (100|\d{1,2}) to capture 100.
5

Abc.pl:

#!/usr/bin/perl -w    
while(<>) {
    while (/(string(\d{1,3}))/g) {      
    print "$1\n" if $2 <= 100;
    } 
}

Example:

$ cat Abc.txt 
This is string1 this is string
This is string11 
This is string6 and it is in this line
string1 asdfa string2
string101 string3 string100 string1000
string9999 string001 string0001

$ perl Abc.pl Abc.txt
string1
string11
string6
string1
string2
string3
string100
string100
string001
string000

$ perl -nE"say $1 while /(string(?:100|\d{1,2}(?!\d)))/g" Abc.txt
string1
string11
string6
string1
string2
string3
string100
string100

Note the difference between the outputs. What is preferable depends on your needs.

Comments

-1

Don't overspecify. To capture the numeric portion, just use (\d+) . This will capture a number of any length, so that some day when the monkeys who are providing you with this file decide to expand their range up to 999, you will be covered. It's also less thought, both now when you are writing, and later when you are maintaining.

Be strict in what you emit, but be liberal in what you accept.

2 Comments

it actually depends on the spec you're given. If you're writing a throwaway script to capture only these numbers, you don't want to use (\d+)
I can't figure it out, Nathan ... why not? If I'm just writing a throwaway script, I don't want to invest extra time to make the regex more complicated than that.
-2

Just change print $pattern to print $&, which is already captured.

2 Comments

Also, $& has bad performance implications for your entire system. See search.cpan.org/perldoc?Devel::SawAmpersand
0. Yeah, the regex was wrong but using $& is the shortest code to print the correct result. 1. this not library code, the performance impact is the same as using $1. 2. the global PL_sawampersand hack is a perl internal implementation issue and should be fixed in perl.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.