-1

So lets say I had the string.

 $my str = "Hello how are you today. Oh thats good I'm glad you are happy. Thats wonderful; thats fantastic."

I want to create a hash table where each key is a unique word and the value is the number of times it appears in the string i.e., I want it to be an automated process.

my %words {
  "Hello" => 1,
  "are" => 2,
  "thats" => 2,
  "Thats" => 1
  };

I honestly am brand new to PERL and have no clue how to do this, how to handle the punctuation etc.

UPDATE:

Also, is it possible to use

   split('.!?;',$mystring)   

Not with this syntax, but basically split at a . or ! or ? etc.. oh and ' ' (whitespace)

4
  • How do you want to handle punctuation is the question. Is I'm a duplicate of I am, or should it only be a duplicate of itself? Is ultra-complex a duplicate of ultracomplex or not? Commented Feb 19, 2013 at 22:13
  • Anything that is different in anyway should be different. I meant punctuation like .'s !'s ;'s and ?'s. Sorry. Commented Feb 19, 2013 at 22:14
  • You'll find some hints here. Commented Feb 19, 2013 at 22:14
  • also somewhat related: stackoverflow.com/questions/8252547/… Commented Feb 19, 2013 at 22:46

4 Answers 4

4

One simple way to do it is to split the string on any character that is not a valid word-character in your view. Note that this is by no means an exhaustive solution as it is. I have simply taken a limited set of characters.

You can add valid word-characters inside the brackets [ ... ] as you discover edge cases. You might also search http://search.cpan.org for modules designed for this purpose.

The regex [^ ... ] means match any character that is not inside the brackets. \pL is a larger subset of letters, and the others literal. Dash - must be escaped because it is a meta character inside a character class bracket.

use strict;
use warnings;
use Data::Dumper;

my $str = "Hello how are you today. Oh thats good I'm glad you are happy.
           Thats wonderful; thats fantastic.";
my %hash;
$hash{$_}++                      # increase count for each field
    for                          # in the loop
    split /[^\pL'\-!?]+/, $str;  # over the list from splitting the string 
print Dumper \%hash;

Output:

$VAR1 = {
          'wonderful' => 1,
          'glad' => 1,
          'I\'m' => 1,
          'you' => 2,
          'how' => 1,
          'are' => 2,
          'fantastic' => 1,
          'good' => 1,
          'today' => 1,
          'Hello' => 1,
          'happy' => 1,
          'Oh' => 1,
          'Thats' => 1,
          'thats' => 2
        };
Sign up to request clarification or add additional context in comments.

13 Comments

Ok thanks. how do I account for the fact that Thats is NOT supposed to be thats.
@Vlad You want to distinguish between upper and lower case? Then change lc($_) to just $_. I'll remove it.
ok thanks. I'll have to work on learning that syntax. Is there an error in yours? Why is everything red i.e. a string?
@Vlad Red? Are you talking about stackoverflow's code highlighting? That's just the single quote making it think its a quoted string.
I figured that out haha. Thanks for your help!
|
1

This will use whitespace to separate words.

#!/usr/bin/env perl
use strict;
use warnings;

my $str = "Hello how are you today."
        . " Oh thats good I'm glad you are happy."
        . " Thats wonderful. thats fantastic.";

# Use whitespace to split the string into single "words".
my @words = split /\s+/, $str;

# Store each word in the hash and count its occurrence.
my %hash;
for my $word ( @words ) {
    $hash{ $word }++;
}

# Show each word and its count. Using printf to align output.
for my $key ( sort keys %hash ) {
    printf "\%-10s => \%d\n", $key, $hash{ $key };
}

You will need some fine-tuning to get "real" words.

Hello      => 1
I'm        => 1
Oh         => 1
Thats      => 1
are        => 2
fantastic. => 1
glad       => 1
good       => 1
happy.     => 1
how        => 1
thats      => 2
today.     => 1
wonderful. => 1
you        => 2

7 Comments

he needs more delimiters than just space. see what i did: my @strAry = split /[:,\.\s\/]+/, $str;
That's what the "will need some fine-tuning" is for. Waiting for homeworkoverflow.com so I can post it there. ;-)
@Perleone so for PERL, I can put a variable in the {} and it will just add it to the hash? How do I access a variable then? And if that variable is already in the Hash what happens?
@Vlad Yes. $hash{beer} = 5; adds the key beer with the value 5 to %hash. You access it the same way: print $hash{beer}; will output 5. If the key is already present, the value will be overwritten: $hash{beer} = 3;.
oh ok. so the ++ will just add one to the present value?
|
1

Try this:

use strict;
use warnings;

my $str = "Hello, how are you today. Oh thats good I'm glad you are happy. 
           Thats wonderful.";
my @strAry = split /[:,\.\s\/]+/, $str;
my %strHash;

foreach my $word(@strAry) 
{
    print "\nFOUND WORD: ".$word;
    my $exstCnt = $strHash{$word};

    if(defined($exstCnt)) 
    {
        $exstCnt++;
    } 
    else 
    {
        $exstCnt = 1;
    }

    $strHash{$word} = $exstCnt;
}

print "\n\nNOW REPORTING UNIQUE WORDS:\n";

foreach my $unqWord(sort(keys(%strHash))) 
{
    my $cnt = $strHash{$unqWord};
    print "\n".$unqWord." - ".$cnt." instances";
}

13 Comments

Why the double spaced formatting? You don't have to use the concatenation operator to interpolate variables, just enter them in the string "Found word $word\n". You don't need to go over a transition variable to increment the counter, just increment it directly.
@Pfoampile so for PERL, I can put a variable in the {} and it will just add it to the hash? How do I access a variable then? And if that variable is already in the Hash what happens?
yes, @Vlad. $strHash{'Vlad'} = 1; adds key 'Vlad' to the hash and assigns value 1 to it
@TLP, good point. but i think this more verbose way is more intelligible for our beginner Vlad to follow what is going on
@foampile your explantion is very thourough. I have programmed before, just not in PERL, so it was easy to follow your logic. Thanks :)
|
0
 use YAML qw(Dump);
 use 5.010;

 my $str = "Hello how are you today. Oh thats good I'm glad you are happy. Thats wonderful; thats fantastic.";
 my @match_words = $str =~ /(\w+)/g;
 my $word_hash = {};
 foreach my $word (sort @match_words) {
     $word_hash->{$word}++;
 }
 say Dump($word_hash);
 # -------output----------
 Hello: 1
 I: 1
 Oh: 1
 Thats: 1
 are: 2
 fantastic: 1
 glad: 1
 good: 1
 happy: 1
 how: 1
 m: 1
 thats: 2
 today: 1
 wonderful: 1
 you: 2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.