I am an absolute newbie to Perl as well as programming in general(less than a month's experience).
I am stumped with a problem which needs to be resolved if I am to solve a bigger issue.
Basically, I have 2 arrays which look like this:
@array1 = ('NM_1234' , '1452' , 'NM_345' , '5008' , 'NR_6145' , '256');
@array2 = ('NM_5673' , '2' , 'NM_345' , '5' , 'NR_6145' , '10');
@array1 contains id numbers followed by length. The id number is of nucleotide sequences and length is the length of the sequence.
@array2 contains id numbers followed by the number of G-Quadruplex structures in each so some sequences contain only 2 such structures while others contain 10 or more.
The basic problem is, I need to add to @array2, the "length numbers" in @array1(eg 5008, 256) for every matching id number.
So for example as NM_345 matches in both the arrays, I need to add 5008 to it, so that the final result becomes like NM_345,5,5008.
Similarly with NR_6145 and other such matches ( There are over 20,000 id numbers in @array2)
So far, I have been able to write code which can just search for the same id number in both the arrays. Here is the code:
#Enter file name
print "Enter file name: ";
$in =<>;
chomp $in;
open(FASTA,"$in") or die;
@data = <FASTA>; #Read in data
$data = join ('',@data); #Convert to string
@data2 = split('\n',$data); #Explode along newlines
#Enter 2nd file name
print "\n\nEnter 2nd file name: ";
$in2=<>;
chomp $in2;
open(FASTA,"$in2") or die;
@entry =<FASTA>; #Read in data
$entry = join('',@entry); #Convert to string
@entry2 = split('\n',$entry); #Explode along newlines
my %seen;
for $item (@data2) {
if($item =~ /([0-9]+)/){
push @{$seen{$key}}, $item;#WHAT IS THIS DOING? HOW?
}
}
for my $item (@entry2) {
if ($item =~ /([0-9]+)/){
if (exists $seen{$key}) {
print $item,"\n";
};
}
}
exit;
I derived the code which finds the same element from 2 arrays from this solution here, so full credit goes to Chas.Owens: https://stackoverflow.com/a/1064929/1468737. And of course, I do not quite yet understand this part:
push @{$seen{$key}}, $item;#WHAT IS THIS DOING? HOW?
It appears to be an array of a hash value or something?
So , now how do I add the length element from @array1 into @array2? I need to use the splice command I think, but how?
My desired output should look like this:
NM_345,5,5008 <br>
NM_6145,10,256<br>
etc
I also need to save this output into a file which will then later be analyzed to see if there is any correlation between length and G-quadruplex number.
Any help or input will be deeply appreciated.
Thank you for taking the time to go through my problem!
EDIT: This edit is to show how the data files look like. They are basically putput files from other programs I wrote.
My first file,named, Transcriptlength.fa, with over 40,000 id numbers going into @array1 looks like this:
NR_037701
3353
NM_198399
2414
NR_026816
601
NR_027917
658
NR_002777
1278
My second file,named Quadcount.AllGtranscripts.fa, with over 20,000id numbers going into @array2, looks like this:
NM_000014
1
NM_000016
3
NM_000017
19
NM_000018
2
NM_000019
3
NM_000020
30
NM_000021
1
NM_000022
2
NM_000023
5
NM_000024
1
NM_000025
15
NM_000029
5
my %data; $data{ 'a_sequence' } = ( $count, $length );which assigns an array reference to a hash key. Readperldoc perreftut($count, $length)is not a reference, it's a list.[$count, $length]should be used instead.