1

I'm pretty new with Perl and was hoping if anyone could help me with this issue. I need to extract two columns from a CSV file embedded commas. This is how the format looks like:

"ID","URL","DATE","XXID","DATE-LONGFORMAT"

I need to extract the DATE column, the XXID column, and the column immediately after XXID. Note, each line doesn't necessarily follow the same number of columns.

The XXID column contains a 2 letter prefix and doesn't always starts with the same letter. It can pretty much be any letter of the aplhabet. The length is always the same.

Finally, once these three columns are extracted, I need to sort on the XXID column and get a count on duplicates.

0

3 Answers 3

3

I published a module called Tie::Array::CSV which lets Perl interact with your CSV as a native Perl nested array. If you use this, you can take your search logic and apply it just as if your data were already in an array of array-references. Take a look!

#!/usr/bin/env perl

use strict;
use warnings;

use File::Temp;
use Tie::Array::CSV;
use List::MoreUtils qw/first_index/;
use Data::Dumper;

# this builds a temporary file from DATA
# normally you would just make $file the filename
my $file = File::Temp->new;
print $file <DATA>;
#########

tie my @csv, 'Tie::Array::CSV', $file;

#find column from data in first row
my $colnum = first_index { /^\w.{6}$/ } @{$csv[0]};
print "Using column: $colnum\n";

#extract that column
my @column = map { $csv[$_][$colnum] } (0..$#csv);

#build a hash of repetitions
my %reps;
$reps{$_}++ for @column;

print Dumper \%reps;
Sign up to request clarification or add additional context in comments.

Comments

3

Here's a sample script using the Text::CSV module to parse your csv data. Consult the documentation for the module to find the proper settings for your data.

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new({ binary => 1 });

while (my $row = $csv->getline(*DATA)) {
    print "Date: $row->[2]\n";
    print "Col#1: $row->[3]\n";
    print "Col#2: $row->[4]\n";
}

Comments

0

You definitely want to use a CPAN library for parsing CSV, as you will never account for all the quirks of the format.

Please see: How can I parse quoted CSV in Perl with a regex?

Please see: How do I efficiently parse a CSV file in Perl?

However, here is a very naive and non-idiomatic solution for that particular string you provided:

use strict;
use warnings;

my $string = '"ID","URL","DATE","XXID","DATE-LONGFORMAT"';

my @words = ();
my $word = "";
my $quotec = '"';
my $quoted = 0;

foreach my $c (split //, $string)
{
  if ($quoted)
  {
    if ($c eq $quotec)
    {
      $quoted = 0;
      push @words, $word;
      $word = "";
    }
    else
    {
      $word .= $c;
    }
  }
  elsif ($c eq $quotec)
  {
    $quoted = 1;
  }
}

for (my $i = 0; $i < scalar @words; ++$i)
{
  print "column " . ($i + 1) . " = $words[$i]\n";
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.