1

I have a data like this

Group AT1G01040-TAIR-G
        LOC_Os03g02970 69%
Group AT1G01050-TAIR-G
        LOC_Os10g26600 85%
        LOC_Os10g26633 35%
Group AT1G01090-TAIR-G
        LOC_Os04g02900 74%

How can create the data structure that looks like this:

print Dumper \%big;

$VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'],
         "Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'],
         "Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};

This is my attempt, but fail:

my %big;
while ( <> ) {
    chomp;
    my $line = $_;
    my $head = "";
    my @temp;

    if ( $line =~ /^Group/ ) {
        $head = $line;
        $head =~ s/[\r\s]+//g;
        @temp = ();


    }
    elsif ($line =~ /^\t/){
        my $cont = $line;
           $cont =~ s/[\t\r]+//g;
        push @temp, $cont;

        push @{$big{$head}},@temp;
    };

}
1
  • 1
    why not produce a hash of arrays as hashes? So your datastructure becomes like: Group AT1G01040-TAIR-G" => [{'LOC_Os03g02970' = > 69}] (in case you need to do some calculation with them, or store them in a xml or...?) Commented Aug 20, 2011 at 7:39

3 Answers 3

2

Here's how I'd do it:

my %big;
my $currentGroup;

while (my $line = <> ) {
    chomp $line;

    if ( $line =~ /^Group/ ) {
        $big{$line} = $currentGroup = [];
    }
    elsif ($line =~ s/^\t+//) {
        push @$currentGroup, $line;
    }
}

You should probably add some additional error checking to this, e.g. an else clause to warn about lines that don't match either regex. Also, check to see if $currentGroup is undef before pushing (in case the first line begins with a tab instead of "Group").

The biggest problem with your original code is that you're declaring and initializing $head and @temp inside the loop, which means they got reset on every line. Variables that need to persist across lines have to be declared outside the loop, as I've done with $currentGroup.

I'm not quite sure what you're intending to accomplish with the s/[\r\s]+//g; bit. \r is included in \s, so that means the same as s/\s+//g; (which would strip all whitespace), but your desired result hash includes whitespace in your keys. If you want to strip trailing whitespace, you need to include an anchor: s/\s+\z//.

Sign up to request clarification or add additional context in comments.

1 Comment

I think I would do the same but also make a reference out of %big. ($big->{$line}) It took my Perl masters a lot of time to teach/beat me until I understood them but cant live without them anymore I'm now trying to convince my current colleagues.
2

Well, I don't want to give you an answer, so I'll just tell you to look at:

Well, there ya go :-).

4 Comments

+1 for perlreftut, one of the most useful docs in all of perldoc!
Thanks @Joel. I found it very useful in learning to use References in Perl.
That comment got me to thinking about my favorite perldocs and I made a little blog post about some it.
@Joel: Nice blog post! Though I would say you need to read more than just those three Perldocs to learn Perl.
1

Your pushing arrays to your hash item. You should just be pushing the values. (You don't need @temp at all.)

push @{$big{$head}}, $cont;

Also $head must be declared outside your loop, otherwise it looses its value after each iteration.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.