1

I have this line:

[1] "RPKM_AB123_Gm12878_control.extended.bed_28m_control_500 and RPKM_AB156_GM12878-50ng_test.extended.bed_28m_test_500"

and I want to extract AB123_Gm12878_control and AB156_GM12878-50ng from the string.

I have tried this and it isn't working yet.

if ($_ =~ /.*"RPKM_([\w.]+).extended.+\s\w+\sRPKM_([\w.]+).extended.+"/){
   print $1,"\t",$2,"\t";
}

Can someone point out where I did it wrong? Thanks!

2 Answers 2

3
".*RPKM_([\w.]+).extended.+\s\w+\sRPKM_([\w.]+).extended.+"
                                        ^^^^^

This character class is not accepting - which the string your matching against contains.

Try putting the hyphen in:

".*RPKM_([\w.]+)\.extended.+\s\w+\sRPKM_([\w.-]+)\.extended.+"

Also, it's good to escape the periods.

Sign up to request clarification or add additional context in comments.

4 Comments

The periods, on the other hand, should probably not be in the charclasses.
@amon Probably. No idea whether the OP has strings where the parts to be extracted contain these...
Also, about the period, in this particular case I don't need to include period in the final output. I read somewhere that they don't escape period.Is it necessary to escape it?
@user2157668 It's safer to, so you don't end up with unexpected results!
1

You can simplify regex and match all occurrences using /g

if ( my($m1, $m2) = /RPKM_([^.]+)/g ) {
  print $m1,"\t",$m2,"\t";
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.