0

I have an array with n number of elments. I want to add each of the elements to a complex hash, each as a key/value pair. If the number of elements were fixed, say three, I would do:

my %hash;
my @array = ("first", "second", "third");
$hash{$array[0]}{$array[1]}{$array[2]}++;

The structure I want to end up with, is this (printed with Data::Dumper):

$VAR1 = 'first';
$VAR2 = {
        'second' => {
                   'third' => 1
};

But I am at loss at achieving the same structure when the number of elements in the array isn't fixed. Something with anonymous variables and iterating through the array, yes, but something like

foreach @array{$hash{$_}++};
will obviously only make one entry per element, and not the desired structure. Help?

2 Answers 2

4

Something like this could build the structure you desire for N elements:

use strict;
use warnings; 

use Data::Dumper;
my @array = qw(first second third four five six seven);

my $hash;
foreach my $key ( reverse @array ) {
   $hash = { $key => $hash };
}
print Dumper $hash;

__END__
$VAR1 = {
      'first' => {
                   'second' => {
                                 'third' => {
                                              'fourth' => {
                                                          'fifth' => {
                                                                      'sixth' => {
                                                                                 'seventh' => undef
                                                                               }
                                                                    }
                                                        }
                                            }
                               }
                 }
    };

It is not clear what you really need this for. There may be a better solution if you explain your use-case a little more. Incrementing this structure doesn't appear to be very easy.

After playing around a little, you can increment by traversing the hash reference to the bottom then incrementing the value of the last element. It is not very pretty though :|

# incrementing
my $elem = $hash; # copy the reference 
foreach my $key ( @array ) {
   # found the bottom of the hash
   unless ( $elem->{$key} && ref($elem->{$key}) ) { 
      $elem->{$key}++;
      last; 
   }

   # not at the bottom, move to the next level
   $elem = $elem->{$key};
}
print Dumper $hash;

__END__
$VAR1 = {
      'first' => {
                   'second' => {
                                 'third' => {
                                              'fourth' => {
                                                            'fifth' => {
                                                                         'sixth' => {
                                                                                      'seventh' => 1
                                                                                    }
                                                                       }
                                                          }
                                            }
                               }
                 }
    };
Sign up to request clarification or add additional context in comments.

3 Comments

My goal is to count occurrences of ngrams in a text. E.g, in the sentence "under a rock, and under a tree", the combinations of two words following each other would be "under a, a rock, rock and, and under, under a, a tree". Then I want the frequency of each two(or three, or four)-word combinations. In that sentence, all the so-called bigrams will have the value 1 (number of occurrences), while "a rock" will have a value of 2, because it occurrs two times. Text::Ngrams would probalby work, but I am interessted in the theory behind it.
If you want to conserve space in such a way this is fine, but I am not sure of a good way to perform your count. I wrote a simpleish n-gram generator awhile ago that might provide a decent example: github.com/mcmillhj/markov-perl/Chain.pm
@vetinari: If that is your goal then you cannot use simple nested hashes like this. You need $hash{under}{a} to have a count of its own, as well as being another hash reference so that you can have a count in $hash{under}{a}{rock}
3

This is relatively simple if you maintain a current hash reference. This short program demonstrates

The first few steps make sure that each hash element exists and its value is a hash reference. $href is moved to the next level of hash at each stage. For the final element of the array, the latest hash level's element is incremented instead of being set to a hash reference.

Whether or not this data structure is the correct choice depends on what else you need to do with it once you have built it

use strict;
use warnings;

my %hash;
my @array = qw/ first second third fourth fifth /;

drill_hash(\%hash, @array);

use Data::Dump;
dd \%hash;

sub drill_hash {
  my ($href, @list) = @_;
  my $final = pop @list;
  $href = $href->{$_} //= {} for @list;
  ++$href->{$final};
}

output

{
  first => { second => { third => { fourth => { fifth => 1 } } } },
}

Update

Having understood your purpose, the simplest way to keep a count of occurrences of ngrams like that is to have a speficic hash key that is used to keep the count of the sequence of words so far.

This program uses the value _COUNT for that key, and you can see that, for example, {under}{a}{_COUNT} and {under}{a}{rock}{_COUNT} are both 1

use strict;
use warnings;

my %counts;

count_ngram(\%counts, qw/ under a /);
count_ngram(\%counts, qw/ a rock /);
count_ngram(\%counts, qw/ under a rock /);
count_ngram(\%counts, qw/ a tree /);
count_ngram(\%counts, qw/ under a tree /);

use Data::Dump;
dd \%counts;

sub count_ngram {
  my ($href, @ngram) = @_;
  my $final = pop @ngram;
  $href = $href->{$_} //= {} for @ngram;
  ++$href->{$final}{_COUNT};
}

output

{
  a => { rock => { _COUNT => 1 }, tree => { _COUNT => 1 } },
  under => {
    a => { _COUNT => 1, rock => { _COUNT => 1 }, tree => { _COUNT => 1 } },
  },
}

1 Comment

this one looks a lot cleaner than what I suggested.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.