How can I find a substring within a string using Perl?

Question

I have a string from which I wish to extract a single word, but with a numerical appended to it, which might be different in each line:

This is string1 this is string
This is string11 
This is string6 and it is in this line

I want to parse this file and get the values of "stringXXX", starting from 0 to 100

# suppose ABC.txt contains the above lines
FH1 = open "Abc.txt"; 
@abcFile = <FH1>;

foreach $line(@abcFile) {
    if ($pattern =~ s/string.(d{0}d{100});
        print $pattern;

The above prints the whole line, I wish to get only stringXXX

Nathan Fellman · Accepted Answer · 2008-12-11 18:45:47Z

13

you need to capture it:

while ($pattern =~/(string(100|\d{1,2}))/g) {
    print $1;
}

Explanation:

the parentheses capture what's in them into $1. If you have more than one set of parens, the 1st captures into $1, the 2nd into $2 etc. In this case $2 will have the actual number.
\d{1,2} captures between 1 and 3 digits, allowing you to capture between 0 and 99. The additional 100 there allows you to capture 100 explicitly, since it's the only 3-digit number you want to match.

edit: fixed the order of the numbers that are captured.

edited Dec 11, 2008 at 18:45

answered Dec 8, 2008 at 4:39

Nathan Fellman

129k105 gold badges267 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jfs Over a year ago

(\d{1,2}) captures between 1 and 2 digits.

jfs Over a year ago

Your regexp captures '10' for 'string100'. It should be (100|\d{1,2}) to capture 100.

jfs · Accepted Answer · 2008-12-08 06:12:05Z

5

Abc.pl:

#!/usr/bin/perl -w    
while(<>) {
    while (/(string(\d{1,3}))/g) {      
    print "$1\n" if $2 <= 100;
    } 
}

Example:

$ cat Abc.txt 
This is string1 this is string
This is string11 
This is string6 and it is in this line
string1 asdfa string2
string101 string3 string100 string1000
string9999 string001 string0001

$ perl Abc.pl Abc.txt
string1
string11
string6
string1
string2
string3
string100
string100
string001
string000

$ perl -nE"say $1 while /(string(?:100|\d{1,2}(?!\d)))/g" Abc.txt
string1
string11
string6
string1
string2
string3
string100
string100

Note the difference between the outputs. What is preferable depends on your needs.

edited Dec 8, 2008 at 6:12

answered Dec 8, 2008 at 5:33

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Comments

skiphoppy · Accepted Answer · 2008-12-09 17:51:18Z

-1

Don't overspecify. To capture the numeric portion, just use (\d+) . This will capture a number of any length, so that some day when the monkeys who are providing you with this file decide to expand their range up to 999, you will be covered. It's also less thought, both now when you are writing, and later when you are maintaining.

Be strict in what you emit, but be liberal in what you accept.

answered Dec 9, 2008 at 17:51

skiphoppy

104k77 gold badges181 silver badges221 bronze badges

2 Comments

Nathan Fellman Over a year ago

it actually depends on the spec you're given. If you're writing a throwaway script to capture only these numbers, you don't want to use (\d+)

skiphoppy Over a year ago

I can't figure it out, Nathan ... why not? If I'm just writing a throwaway script, I don't want to invest extra time to make the regex more complicated than that.

ididak · Accepted Answer · 2008-12-08 05:21:20Z

-2

Just change print $pattern to print $&, which is already captured.

answered Dec 8, 2008 at 5:21

ididak

5,9181 gold badge23 silver badges21 bronze badges

2 Comments

mpeters Over a year ago

Also, $& has bad performance implications for your entire system. See search.cpan.org/perldoc?Devel::SawAmpersand

ididak Over a year ago

0. Yeah, the regex was wrong but using $& is the shortest code to print the correct result. 1. this not library code, the performance impact is the same as using $1. 2. the global PL_sawampersand hack is a perl internal implementation issue and should be fixed in perl.

Collectives™ on Stack Overflow

How can I find a substring within a string using Perl?

4 Answers 4

2 Comments

Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related