Find Duplicate arrays and Intersection of arrays in array of hash values using Perl

Question

I want to find duplicate Arrays from hash that contains arrays. Point is, I am trying to develop sets and storing them into hash table of Perl. After, I need to extract 1. those arrays which are completely duplicate(Having all values same). 2. Intersection of arrays

Source code is given as under:  


use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Bob", "Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Bob",  "Grook", "Franky");
my @test5= ();
my @test6=();

my %arrayHash= ( "ppl1" => [@test1],
             "ppl2"=> [@test2], 
             "ppl3" => [@test3],
             "ppl4"=> [@test4], 
             "ppl5"=> [@test5],
             "ppl6"=> [@test6],  

            );


Required Output:  ppl1 and   ppl3 have duplicate lists
Intersection of arrays= Bob

Kindly note that duplication of empty arrays is not desired!

Sobrique · Accepted Answer · 2015-09-16 11:40:03Z

1

So there's a set of steps here:

compare your arrays one to the other. This is harder because you're doing multi-element arrays. You can't directly test equivalence, because you need to compare members.
Filter one from the other.

So first of all:

(Edit: Coping with empty)

#!/usr/bin/env perl

use strict;
use warnings;

my @test1 = ( "Bob",   "Flip",  "David" );
my @test2 = ( "Kevin", "John",  "Michel" );
my @test3 = ( "Bob",   "Flip",  "David" );
my @test4 = ( "Haidi", "Grook", "Franky" );
my @test5 = ();
my @test6 = ();

my %arrayHash = (
    "ppl1" => [@test1],
    "ppl2" => [@test2],
    "ppl3" => [@test3],
    "ppl4" => [@test4],
    "ppl5" => [@test5],
    "ppl6" => [@test6],

);

my %seen;

#cycle through the hash
foreach my $key ( sort keys %arrayHash ) {

    #skip empty:
    next unless @{ $arrayHash{$key} };

    #turn your array into a string - ':' separated
    my $value_str = join( ":", sort @{ $arrayHash{$key} } );

    #check if that 'value string' has already been seen
    if ( $seen{$value_str} ) {
        print "$key is a duplicate of $seen{$value_str}\n";
    }
    $seen{$value_str} = $key;
}

Now note - this is a bit of a cheat - it sticks together your arrays with :, which doesn't work in every scenario.

("Bob:", "Flip") and ("Bob", ":Flip") will end up the same.

It will also only print your most recent duplicate if you have multiple.

You can work around this - if you want - by pushing multiple values into the %seen hash.

edited Sep 16, 2015 at 11:40

answered Sep 16, 2015 at 9:02

Sobrique

53.6k8 gold badges63 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Analyzer Over a year ago

Your code is quite understandable. However, it does not produce desired output. What it outputs is only those keys which are being duplicated. How can we edit the code to give out put keys which are duplicating with other keys. Thanks

Arunesh Singh · Accepted Answer · 2015-09-16 11:46:24Z

0

You need to check two arrays for equality for the hash keys.For that you can use smart match operator for comparison.

Next you can use grep to filter-out values which are not duplicates and a hash to keep track of values which are already checked.

#!/usr/bin/perl
use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Grook", "Franky");
my @test5= ("Bob", "Flip", "David");
my @test6= ("Kevin", "John", "Michel");
my @test7= ("Haidi", "Grook", "Frank4");


my %arrayHash= ( "ppl1" => [@test1],
                 "ppl2"=> [@test2],
                 "ppl3" => [@test3],
                 "ppl4"=> [@test4],
                 "ppl5"=> [@test5],
                 "ppl6"=> [@test6],
                 "ppl7"=> [@test7]
                );

my %seen;
foreach my $key1 (sort keys %arrayHash){
   next unless @{$arrayHash{$key1}};
   my @keys;
   if(@keys=grep{(@{$arrayHash{$key1}} ~~ @{$arrayHash{$_}} ) && ($_ ne $key1) && (not exists $seen{$key1})}sort keys %arrayHash){
           unshift(@keys,$key1);
           print "@keys are duplicates \n";
           @seen{@keys}=@keys;
      }
}

output:

ppl1 ppl3 ppl5 are duplicates 
ppl2 ppl6 are duplicates

edited Sep 16, 2015 at 11:46

answered Sep 16, 2015 at 9:59

Arunesh Singh

3,53521 silver badges26 bronze badges

10 Comments

Analyzer Over a year ago

Is it possible avoid duplication detection for empty arrays. Trouble is, in my actual data, there are some cases of empty arrays as well. However, your code also detects their duplication as valid, which is not desired!

Sobrique Over a year ago

But they literally are duplicates, by content.

Arunesh Singh Over a year ago

@Sobrique indeed. @Analyzer you can just add condition to check if array is defined like if(defined @{$arrayHash{$key1}}){...}

Sobrique Over a year ago

Array in a scalar context returns number of elements. So next unless @{$arrayHash{$key1}} will probably do the trick too.

Arunesh Singh Over a year ago

Thanks @Sobrique that is simpler approach.

|

Sameer Naik · Accepted Answer · 2015-09-16 07:05:53Z

0

use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Grook", "Franky");

my %arrayHash= ( "1" => \@test1,
             "2"=> \@test2,
             "3" => \@test3,
             "4"=> \@test4,

            );

sub arrayCmp {
        my @array1 = @{$_[0]};
        my @array2 = @{$_[1]};

        return 0 if ($#array1 != $#array2);

        @array1 = sort(@array1);
        @array2 = sort(@array2);

        for (my $ii = 0; $ii <= $#array1; $ii++) {
                if ($array1[$ii] ne $array2[$ii]) {
                        #print "$array1[$ii] != $array2[$ii]\n";
                        return 0;
                }
        }

        return 1;
}


my @keyArr = sort(keys(%arrayHash));
for(my $i = 0; $i <= $#keyArr - 1; $i++) {

        my @arr1 = @{$arrayHash{$keyArr[$i]}};

        for(my $j = 1; $j <= $#keyArr; $j++) {
                my @arr2 = @{$arrayHash{$keyArr[$j]}};
                if ($keyArr[$i] ne $keyArr[$j] && arrayCmp(\@arr1, \@arr2) == 1) {
                        print "$keyArr[$i] and $keyArr[$j] are duplicates\n";
                }
        }
}

Outputs this

1 and 3 are duplicates

edited Sep 16, 2015 at 7:05

answered Sep 16, 2015 at 6:50

Sameer Naik

1,4021 gold badge14 silver badges33 bronze badges

8 Comments

Analyzer Over a year ago

Thanks for the effort. But, still, it does not produce the desired output. I will be keen to get "1 and 3 are duplicates". More ever, I am using hash of arrays, so, processing should be carried out through hash

Sameer Naik Over a year ago

Add a simple check. When keys are equal, don't print "are duplicates" messages. Most of the hard work is done for you.

Sameer Naik Over a year ago

Edited answer. Enjoy!!

Analyzer Over a year ago

Thanks. Can we look for more easy approach :) ?

Sameer Naik Over a year ago

This is actually very simple approach. We are interating over hash map and comparing each element with every other element. Its just that perl syntax is making it difficult on eyes. Try to understand the code line by line to learn more about code. You have already saved a lot of time writing it, so spend it in understanding it.

|

Collectives™ on Stack Overflow

Find Duplicate arrays and Intersection of arrays in array of hash values using Perl

3 Answers 3

1 Comment

10 Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

10 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related