I've gotten into a bit of a jam and was wondering if someone could clear it up. What I want to do is:
- Opening a bunch of data containing .txt files
- Creating a multidimensional array that holds @array[@filenames][@data]
- Find which files are duplicates of eachother in terms of data
Here I slurp a file into a variable, use regex to obtain my data and put it into an array:
while (my $row = <$fh>) {
unless ($. == 0) {
{
local $/; # enable slurp
@datalist = <$fh> =~ /\s*\d*\/\s*\d*\|\s*(.*?)\|.*?(?:.*?\|){4}\s*(\S*)\|(\S*).*\|/g; #extract article numbers # $1 = article number, $2 = quantity, $3 = unit
}
push(@arrayofarrays,[@datalist]);
push(@filenames,$file);
last;
}
}
$numr++;
}
open(my $feh,">","test.txt");
print {$feh} Dumper \@arrayofarrays;
A Dumper shows that my data looks fine (pseudoresults to make it easy to read and short):
$VAR1 = [
[
'data type1',
'data type2',
'data type3',
'data type1',
'data type2',
'data type3',
...
],
[
'data type1',
'data type2',
'data type3',
...
],
...
];
So I'm wondering if anyone knows an easy way to check for duplicates between sets of data? I know I can print individual data sets using
What I tried might give a better idea as to what I need to do:
my $i = 0;
my $j = 0;
while ( $i <= scalar @arrayofarrays) {
$j = 0;
while ( $j <= scalar @arrayofarrays) {
if (@{$arrayofarrays[$i]} eq @{$arrayofarrays[$j]}) {
print "\n'$filenames[$i]' is duplicate to '$filenames[$j]'.";
} $j++;
} $i++;
}
eq) rather than a numeric comparison (==).Digest::MD5to create a checksum for each of them and compare the results.