Regular expression in index function

Question

I am looking for occurrence of "CCGTCAATTC(A|C)TTT(A|G)AGT" in a text file.

$text = 'CCGTCAATTC(A|C)TTT(A|G)AGT'; if ($line=~/$text/){ chomp($line); $pos=index($line,$text); }

Searching is working, but I am not able to get the position of "text" in line. It seems index does not accepts a regular expression as substring.

How can I make this work. Thanks

Michał Wojciechowski · Accepted Answer · 2011-09-11 22:15:55Z

14

The @- array holds the offsets of the starting positions of the last successful match. The first element is the offset of the whole matching pattern, and subsequent elements are offsets of parenthesized subpatterns. So, if you know there was a match, you can get its offset as $-[0].

answered Sep 11, 2011 at 22:15

Michał Wojciechowski

2,50015 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mansoor Siddiqui · Accepted Answer · 2011-09-11 22:46:39Z

3

You don't need to use index at all, just a regex. The portion of $line that comes before your regex match will be stored in $` (or $PREMATCH if you've chosen to use English;). You can get the index of the match by checking the length of $`, and you can get the match itself from the $& (or $MATCH) variable:

$text = 'CCGTCAATTC(A|C)TTT(A|G)AGT';
if ($line =~ /$text/) {
    $pos = length($PREMATCH);
}

Assuming you want to get $pos to continue matching on the remaining part of $line, you can use the $' (or $POSTMATCH) variable to get the portion of $line that comes after the match.

See http://perldoc.perl.org/perlvar.html for detailed information on these special variables.

edited Sep 11, 2011 at 22:46

answered Sep 11, 2011 at 22:16

Mansoor Siddiqui

22k11 gold badges51 silver badges69 bronze badges

8 Comments

Deep Over a year ago

Yes, I can do that. But once I capture the position then I am to capture the next 50 chars: substr($line,$pos,50)

Mansoor Siddiqui Over a year ago

You can match on the remaining part of $line the way you said -- is that approach undesirable for some reason? You could also use the $' (or $POSTMATCH) variable to easily get the remaining part of $line.

Mansoor Siddiqui Over a year ago

Please see my amended answer; let me know if you're looking for something else.

Deep Over a year ago

Yes, you are correct. Only thing is that this way I will loose the text. I mean text+x=50 words

Deep Over a year ago

Your worked, but as thought, I am missing the matching string.

|

TLP · Accepted Answer · 2011-09-11 23:26:38Z

1

Based on your comments, it seems like what you are after is matching the 50 characters directly following the match. So, a simple solution would be:

my ($match) = $line =~ /CCGTCAATTC[AC]TTT[AG]AGT(.{50})/;

As you see, [AG] is equivalent to A|G. If you wish to match multiple times, you can use an array @matches, and the /g global option on the regex. E.g.

my @matches = $line =~ /CCGTCAATTC[AC]TTT[AG]AGT(.{50})/g;

You can do this to keep the matching pattern:

my ($pattern, $match) = $line =~ /(CCGTCAATTC[AC]TTT[AG]AGT)(.{50})/g;

Or in a loop:

while ($line =~ /(CCGTCAATTC[AC]TTT[AG]AGT)(.{50})/g;) {
    my ($pattern, $match) = ($1, $2);
}

edited Sep 11, 2011 at 23:26

answered Sep 11, 2011 at 22:38

TLP

68.2k10 gold badges97 silver badges156 bronze badges

4 Comments

Deep Over a year ago

Actually, I need the matching chars so may be have to cheat and instead of 50 put 38

TLP Over a year ago

Wouldn't your question have been a whole lot simpler to answer if you'd said from the start what you wanted? =) Well, assuming you do know how many characters you want to capture, I think you can work out how to fix it.

Deep Over a year ago

This also gave me another idea, so that TLP

Deep Over a year ago

Yes, could have been bit more clear for the next few steps I was trying to do... Will be careful for next time

Abdul Rahman · Accepted Answer · 2012-11-18 08:11:51Z

0

while ($line =~ /(CCGTCAATTC[AC]TTT[AG]AGT)(.{50})/g;) {

I like it, but no ; in while.

I had hard times to search for the reason of errors. T_T.

edited Nov 18, 2012 at 8:11

Abdul Rahman

2,0894 gold badges28 silver badges36 bronze badges

answered Nov 18, 2012 at 7:43

Shin

91 bronze badge

Collectives™ on Stack Overflow

Regular expression in index function

4 Answers 4

Comments

8 Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

8 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related