0

I combined 2 sequence file, so I have 1 file with 2 sequences. I have split these 2 sequences into an @char array- because I later have to compare them character by character. 1 of the sequences, however is on two lines. I want to use the join function to combine the 2 lines but I don't know how to.

Ex:

seq 1

ACGTATATATTATATCTGGCGCTATCGATGCTATCGAT
CGATGCGCG

seq 2

AGTGAGCGTAGCTAGCGGCGCGATCTAGCTA

my code so far

#!usr/bin/perl
use strict;
use warnings;

# open file 1
open (my $seq1, "<", "file1.fa") or die $!;
# open file 2
open (my $seq2, "<", "file2.fa") or die $!;
# open combined file
open (my $combined, ">", "combined.txt") or die $!;

# read file 1, skip header line, write to combined file
while (my $line = <$seq1>) {
        if($line =~ />/) {
                next;
}

        else {
        print $combined "$line\n";
}
}
# read file 2, skip header line, write to combined file on new line
while (my $line2 = <$seq2>) {
        if ($line2 =~ />/) {
                next;
}
        else {
        print $combined "$line2\n";
}
}
# need to open combined file for reading
open (my $combined2, "<", "combined.txt") or die $!;
# read through combined file line by line
while (my $seqs = <$combined2>) {
        chomp($seqs);
# split sequences into characters
        my @chars = split(//, $seqs);
# the sequence from file1 is on 2 separate lines. Need to join these
# lines together
2
  • 3
    How do you know when to join the sequences? How do you know they are broken between two lines and need to be combined? Commented Feb 19, 2013 at 5:55
  • What are you trying to produce? Commented Feb 19, 2013 at 17:32

2 Answers 2

4

Consider using Bio::SeqIO to read your fasta files, as it can handle a sequence that's on multiple lines:

use strict;
use warnings;
use Bio::SeqIO;

my $in = Bio::SeqIO->new( -file => "file1.fa", '-format' => 'Fasta' );

while ( my $seq = $in->next_seq ) {
    my $sequence = $seq->seq;
    print $sequence, "\n";
}

Contents of file1.fa:

>seq0
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>seq1
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME
LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
>seq2
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
>seq3
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK

Output:

FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
Sign up to request clarification or add additional context in comments.

Comments

0

I am assuming that your sequences are separated by a ">" sign, and that is why you have used the if($_ =~ />/) as a skipper. If it is not, comment back and I'll change the code. Here try the following:

open (fil1, "<", "file1.fa") or die $!;
# open file 2
open (fil2, "<", "file2.fa") or die $!;
# open combined file
open (combined, ">", "combined.txt") or die $!;

# read file 1, skip header line, write to combined file
while (<fil1>) {
        if($_ =~ />/) {
                print $combined "\n";
}

        else {
        print $combined "$line";
}
}
# read file 2, skip header line, write to combined file on new line
while (<fil2>) {
        if ($_ =~ />/) {
                print $combined "\n";
}
        else {
        print $combined "$line2";
}
}

Just check out combined.txt, if there are sequences on different lines.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.