0

I'm trying to filter an array of a delimited text file in my program. The array from this text file looks like this:

YCL049C                   1     511.2465  0 0 MFSK
YCL049C                   2    4422.3098  0 0 YLVTASSLFVALT
YCL049C                   3    1131.5600  0 0 DFYQVSFVK
YCL049C                   4    1911.0213  0 0 SIAPAIVNSSVIFHDVSR
YCL049C                   5     774.4059  0 0 GVAMGNVK
..
.

and the code I have for this section of the program is:

my @msfile_filtered;
my $msline;
foreach $msline (@msfile) {

    my ($name, $pnum, $m2c, $charge, $missed, $sequence) = split (" ", $msline);
    if (defined $amino) {

        if ($amino =~ /$sequence/i) {

            push (@msfile_filtered, $msline);

        }

    }
    else {

        push (@msfile_filtered, $msline);

    }

}

$amino will just be a letter that will be input by the user, and corresponds to the last field $sequence. It is not essential that the user actually inputs $amino, so I need to duplicate this array and keep it unchanged if this is the case (hence the else statement). At the minute the @msfile_filtered array is empty, but I am unsure why, any ideas?

EDIT: just to clarify, there is only one space between each field, I copy and pasted this from notpad++, so extra spaced were added. The file itself will only have one space between fields.

Thanks in advance!

0

2 Answers 2

3

The regex that tries to find matching rows is backwards. To find a needle in a haystack, you need to write $haystack =~ /needle/, not the other way around.

Also, to simplify your logic, if $amino is undef, skip the loop entirely. I would rewrite your code as follows:

if (defined $amino)
{
    foreach $msline (@msfile)
    {
        my ($name, $pnum, $m2c, $charge, $missed, $sequence) = split(" ", $msline);
        push @msfile_filtered, $msline if ($sequence =~ /$amino/i);
    }
} else
{
    @msfile_filtered = @msfile;
}

You could simplify this further down to a single grep statement, but that begins to get hard to read. An example of such a line might be:

@msfile_filtered =
    defined $amino
        ? grep { ( split(" ", $_ ) )[5] =~ /$amino/i } @msfile
        : @msfile;
Sign up to request clarification or add additional context in comments.

5 Comments

I used the example that was in the original question, right at the top. (The five rows that begin with YCL049C.)
Thanks for your help! when no input for $amino is entered, the loop to skipped find and the array remains the same, but when I enter a value for $amino, the @msfile_filtered does not contain only sequences with the specified letter. In fact it contains nothing at all. I'll have a play around and report back
I am using getopts, and the statement "my $amino = (defined $opt_a);" to try to turn the command line enter to $amino. When I enter "-a A" into the command line for example, "print $amino" returns "1".
One reason this is not working is because you are trying to match your pattern in reverse, you need to flip flop $sequence and $amino here. Also you could just use split directly in this case if you know for sure there is always one space between the given fields. e.g foreach (@msfile) { push @msfile_filtered, $_ if (split)[5] =~ /$amino/i; }
@user2941526: I don't think you want $amino = (defined $opt_a). I think just $amino = $opt_a will do.
1

The split is should take more than one whitespaces, and the regex vars are vice versa.

First debug to check that values are correct after the split.

Also, you must swap your regex variables like this:

 if ($sequence =~ /$amino/i) {

Now you're checking if $amino contains $sequence, which obviously it doesn't

3 Comments

split is working corretly, the copy and paste from notepad++ was a bit misleading. The file itself only has one space between fields
@folbs: The regex is indeed the root cause.
Splitting on a single literal space is a special case. It trims leading whitespace then split on /\s+/. See perldoc.perl.org/functions/split.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.