2

I'm using Perl to read in a file line by line and die if a condition is met. The condition being that a line has to begin with any of the letters AGCT and the rest of the line can only contain the letters AGCT.

my $fasta = $ARGV[0];

open(FASTA, $fasta) || die("Couldn't read file $fasta\n");
local $/ = "\n>";
while (my $line = <>) {
  if ( $line =~ /^[AGCT]/ && /[AGCT]/ ) {
    die;
  }
}
close FASTA;

I know the syntax in the regexp is wrong, I have tried many variations but can't get it, any ideas?

3
  • 5
    You open FASTA, but then you do nothing with it. Your while loop reads from ARGV instead. Commented Mar 16, 2018 at 13:47
  • Ahh thankyou @melpomene. Commented Mar 16, 2018 at 13:53
  • 1
    Please see my updated answer. Commented Mar 16, 2018 at 15:16

1 Answer 1

10

Your regular expression syntax is correct. Your Perl expression in the if condition is wrong.

if ( 
    $line =~ /^[AGCT]/   # this tests $line
    && /[AGCT]/          # this defaults to $_
) {

You need to use $line =~ // explicitly both times.

In addition, your second pattern does not do what you want. You are missing a *$, and it would make sense to include the beginning character, too. It should read

/^[AGCT][AGCT]*$/

As you can see, essentially you can just have one pattern and be done with it.

if ( $line =~ /^[AGCT]+$/ ) { ... }

You can shorten it even further by doing the opposite pattern and saying it shouldn't match.

if ( $line !~ /[^AGCT]/ ) { ... }

This is a bit confusing because of the double negation ( !~ and [^] though.

In any case, you should chomp your input first. I would write your program like this:

use strict;
use warnings;

# the \n removes the line number from die
open my $fh, '<', $fasta or die "Couldn't read file $fasta";

local $/ = "\n>";
while (my $line = <>) {
  chomp $line;
  die if $line =~ /[^AGCT]/;
}

That program will die if any of the lines contains something that is not A, G, C or T. I do believe that's what you wanted to do.

Sign up to request clarification or add additional context in comments.

8 Comments

or $line =~ /[^AGCT]/
@choroba That's slightly different in that simbabque's regex allows a trailing newline whereas yours doesn't. Also, simbabque's regex doesn't accept empty strings.
@choroba and if you were going to do that, it might as well be $line =~ y/AGCT//c
$line =~ /^[AGCT]+$/ is not the same as $line !~ /[^AGCT]/. The latter matches an empty string.
There's also the issue that $line might be terminated by \n>. It probably can even include newlines.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.