2

I want to find duplicate Arrays from hash that contains arrays. Point is, I am trying to develop sets and storing them into hash table of Perl. After, I need to extract 1. those arrays which are completely duplicate(Having all values same). 2. Intersection of arrays

Source code is given as under:  


use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Bob", "Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Bob",  "Grook", "Franky");
my @test5= ();
my @test6=();

my %arrayHash= ( "ppl1" => [@test1],
             "ppl2"=> [@test2], 
             "ppl3" => [@test3],
             "ppl4"=> [@test4], 
             "ppl5"=> [@test5],
             "ppl6"=> [@test6],  

            );


Required Output:  ppl1 and   ppl3 have duplicate lists
Intersection of arrays= Bob

Kindly note that duplication of empty arrays is not desired!

3 Answers 3

1

So there's a set of steps here:

  • compare your arrays one to the other. This is harder because you're doing multi-element arrays. You can't directly test equivalence, because you need to compare members.

  • Filter one from the other.

So first of all:

(Edit: Coping with empty)

#!/usr/bin/env perl

use strict;
use warnings;

my @test1 = ( "Bob",   "Flip",  "David" );
my @test2 = ( "Kevin", "John",  "Michel" );
my @test3 = ( "Bob",   "Flip",  "David" );
my @test4 = ( "Haidi", "Grook", "Franky" );
my @test5 = ();
my @test6 = ();

my %arrayHash = (
    "ppl1" => [@test1],
    "ppl2" => [@test2],
    "ppl3" => [@test3],
    "ppl4" => [@test4],
    "ppl5" => [@test5],
    "ppl6" => [@test6],

);

my %seen;

#cycle through the hash
foreach my $key ( sort keys %arrayHash ) {

    #skip empty:
    next unless @{ $arrayHash{$key} };

    #turn your array into a string - ':' separated
    my $value_str = join( ":", sort @{ $arrayHash{$key} } );

    #check if that 'value string' has already been seen
    if ( $seen{$value_str} ) {
        print "$key is a duplicate of $seen{$value_str}\n";
    }
    $seen{$value_str} = $key;
}

Now note - this is a bit of a cheat - it sticks together your arrays with :, which doesn't work in every scenario.

("Bob:", "Flip") and ("Bob", ":Flip") will end up the same.

It will also only print your most recent duplicate if you have multiple.

You can work around this - if you want - by pushing multiple values into the %seen hash.

Sign up to request clarification or add additional context in comments.

1 Comment

Your code is quite understandable. However, it does not produce desired output. What it outputs is only those keys which are being duplicated. How can we edit the code to give out put keys which are duplicating with other keys. Thanks
0

You need to check two arrays for equality for the hash keys.For that you can use smart match operator for comparison.

Next you can use grep to filter-out values which are not duplicates and a hash to keep track of values which are already checked.

#!/usr/bin/perl
use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Grook", "Franky");
my @test5= ("Bob", "Flip", "David");
my @test6= ("Kevin", "John", "Michel");
my @test7= ("Haidi", "Grook", "Frank4");


my %arrayHash= ( "ppl1" => [@test1],
                 "ppl2"=> [@test2],
                 "ppl3" => [@test3],
                 "ppl4"=> [@test4],
                 "ppl5"=> [@test5],
                 "ppl6"=> [@test6],
                 "ppl7"=> [@test7]
                );

my %seen;
foreach my $key1 (sort keys %arrayHash){
   next unless @{$arrayHash{$key1}};
   my @keys;
   if(@keys=grep{(@{$arrayHash{$key1}} ~~ @{$arrayHash{$_}} ) && ($_ ne $key1) && (not exists $seen{$key1})}sort keys %arrayHash){
           unshift(@keys,$key1);
           print "@keys are duplicates \n";
           @seen{@keys}=@keys;
      }
}

output:

ppl1 ppl3 ppl5 are duplicates 
ppl2 ppl6 are duplicates

10 Comments

Is it possible avoid duplication detection for empty arrays. Trouble is, in my actual data, there are some cases of empty arrays as well. However, your code also detects their duplication as valid, which is not desired!
But they literally are duplicates, by content.
@Sobrique indeed. @Analyzer you can just add condition to check if array is defined like if(defined @{$arrayHash{$key1}}){...}
Array in a scalar context returns number of elements. So next unless @{$arrayHash{$key1}} will probably do the trick too.
Thanks @Sobrique that is simpler approach.
|
0
use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Grook", "Franky");

my %arrayHash= ( "1" => \@test1,
             "2"=> \@test2,
             "3" => \@test3,
             "4"=> \@test4,

            );

sub arrayCmp {
        my @array1 = @{$_[0]};
        my @array2 = @{$_[1]};

        return 0 if ($#array1 != $#array2);

        @array1 = sort(@array1);
        @array2 = sort(@array2);

        for (my $ii = 0; $ii <= $#array1; $ii++) {
                if ($array1[$ii] ne $array2[$ii]) {
                        #print "$array1[$ii] != $array2[$ii]\n";
                        return 0;
                }
        }

        return 1;
}


my @keyArr = sort(keys(%arrayHash));
for(my $i = 0; $i <= $#keyArr - 1; $i++) {

        my @arr1 = @{$arrayHash{$keyArr[$i]}};

        for(my $j = 1; $j <= $#keyArr; $j++) {
                my @arr2 = @{$arrayHash{$keyArr[$j]}};
                if ($keyArr[$i] ne $keyArr[$j] && arrayCmp(\@arr1, \@arr2) == 1) {
                        print "$keyArr[$i] and $keyArr[$j] are duplicates\n";
                }
        }
}

Outputs this

1 and 3 are duplicates

8 Comments

Thanks for the effort. But, still, it does not produce the desired output. I will be keen to get "1 and 3 are duplicates". More ever, I am using hash of arrays, so, processing should be carried out through hash
Add a simple check. When keys are equal, don't print "are duplicates" messages. Most of the hard work is done for you.
Edited answer. Enjoy!!
Thanks. Can we look for more easy approach :) ?
This is actually very simple approach. We are interating over hash map and comparing each element with every other element. Its just that perl syntax is making it difficult on eyes. Try to understand the code line by line to learn more about code. You have already saved a lot of time writing it, so spend it in understanding it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.