Perl count for matching strings in an array

Question

I have an array filled with strings. I want to check if one specific string is more than once in this array and then print an error warning.

I used the true method in List::MoreUtils: to count my matches. In my array i have some strings, which have substrings that are the same as some other string in the same array.
So if I check if the same string is more than once in the array I get my error warning even though there is maybe just another string with the same substring. I tried to fix the problem by adding the string length as a pattern (So the string and the length have to be equal that the error message pops up), but this does not work either.
My code looks something like this:

use strict;
use warnings;
use List::MoreUtils 'true';

my @list = ("one", "two", "three", "onefour", "one");

foreach my $f (@list) {  

        my $length = length($f);
        my $count = true { $length && "$f"} @list;

           if($count > 1) {
                    print "Error with: ", $f, " counted ", $count, " times!\n";
                }
       $count = 0;
    }

With this code I do not get an error warning at all, even though "one" is twice in the array. If I do not include the length as a pattern of the true method, then the string "one" will be counted three times.

Are you after just "one" being reported as a dupe? E.g. not substring matching? — Sobrique
– Sobrique, Commented Jul 10, 2015 at 11:27

Sobrique · Accepted Answer · 2015-07-10 11:51:39Z

4

I wouldn't use true for this - it looks like what you're trying to do is 'pick out' duplicates, and don't care about substrings.

my %seen;
$seen{$_}++ for @list; 
print grep { $seen{$_} > 1 } @list;

So to replicate your test:

my %count_of;
$count_of{$_}++ for @list;  
foreach my $duplicate (  grep { $count_of{$_} > 1 } @list ) {
    print "Error: $duplicate was seen $count_of{$duplicate} time\n";
}

edited Jul 10, 2015 at 11:51

answered Jul 10, 2015 at 11:26

Sobrique

53.6k8 gold badges63 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

nieka Over a year ago

I don't want to "pick out" duplicates. I want to print an error message if there are duplicates in the array, and not change the array/erase the duplicates!

Sobrique Over a year ago

This doesn't modify your array - grep creates a 'new' one that you print. I've added a snippet that I think does what you desire?

nieka Over a year ago

Sorry for the late answer. Your answer worked perfectly fine and solved my problem! Thanks a lot ;)

Community · Accepted Answer · 2017-05-23 10:26:49Z

You're actually not matching anything. I added debug output to your code.

my @list = ( "one", "two", "three", "onefour", "one" );

foreach my $f (@list) {
    say "f: $f";
    my $length = length($f);
    say "length: $length";
    say "true { $length && $f} $_: " . ( $length && "$f" ) for @list;
    my $count = true { $length && "$f" } @list;
    say "count: $count";

    if ( $count > 1 ) {
        print "Error with: ", $f, " counted ", $count, " times!\n";
    }
    $count = 0;
}

Let's take a look:

f: one
length: 3
true { 3 && one} one: one
true { 3 && one} two: one
true { 3 && one} three: one
true { 3 && one} onefour: one
true { 3 && one} one: one
count: 5
Error with: one counted 5 times!
f: two
length: 3
true { 3 && two} one: two
true { 3 && two} two: two
true { 3 && two} three: two
true { 3 && two} onefour: two
true { 3 && two} one: two
count: 5
Error with: two counted 5 times!
f: three
length: 5
true { 5 && three} one: three
true { 5 && three} two: three
true { 5 && three} three: three
true { 5 && three} onefour: three
true { 5 && three} one: three
count: 5
Error with: three counted 5 times!
f: onefour
length: 7
true { 7 && onefour} one: onefour
true { 7 && onefour} two: onefour
true { 7 && onefour} three: onefour
true { 7 && onefour} onefour: onefour
true { 7 && onefour} one: onefour
count: 5
Error with: onefour counted 5 times!
f: one
length: 3
true { 3 && one} one: one
true { 3 && one} two: one
true { 3 && one} three: one
true { 3 && one} onefour: one
true { 3 && one} one: one
count: 5
Error with: one counted 5 times!

So you always have the length of the string $f, which is larger than 0 and hence evaluates as true in Perl. Then you have $f. Which is true also, because all strings that are not the empty string ('') are true.

You iterate over all elements in @list with the true function. The block always is true. So you always get the count of the number of elements in @list.

If you only want to remove double occurances, you can use a hash to count them.

my %count;
$count{$_}++ for @list;
my @unique = keys %count; # unsorted
# see Sobrique's answer with grep for sorted the same way as before

Then there is also uniq in List::MoreUtils.

my @unique = uniq @list;

If you want to know for each element if it is a substring of any other element, you can use Perl's builtin index, which finds the position of a string inside another string, and a grep.

foreach my $f (@list) {
    if ( my @matches = grep { $_ ne $f && index( $_, $f ) > -1 } @list ) {
        warn "$f is a substr of: @matches";    # will auto-join on $,
    }
}

__END__

one is a substr of: onefour at /code/scratch.pl line 91.
one is a substr of: onefour at /code/scratch.pl line 91.

Now of course this doesn't get that element 0 and 4 are both "one" because of the ne. Note that index returns -1 if there is no match at all.

Edit after your comment on Sobrique's answer:

To only get warnings if there are duplicates (or substr duplicates), simply count them. There is no modification happening anywhere:

my @list = ( "one", "two", "three", "onefour", "one" );

my %count;
$count{$_}++ for @list;
warn sprintf 'Number of duplicates: %d', @list - keys %count if @list != keys %count;

my $count_substr;
foreach my $f (@list) {
    $count_substr++
        if grep { $_ ne $f && index( $_, $f ) > -1 } @list;
}
warn sprintf 'Number of substring duplicates: %d', $count_substr if $count_substr;

Altogether a more comprehensive answer. I think we've reached similar conclusions it that true isn't really the right thing to be using.
Thank you @Sobrique. :) Have you btw notices that our names look similar? That always creeps me out.

Collectives™ on Stack Overflow

Perl count for matching strings in an array

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related