How to extract multiple columns from a CSV file using Perl

Question

I'm pretty new with Perl and was hoping if anyone could help me with this issue. I need to extract two columns from a CSV file embedded commas. This is how the format looks like:

"ID","URL","DATE","XXID","DATE-LONGFORMAT"

I need to extract the DATE column, the XXID column, and the column immediately after XXID. Note, each line doesn't necessarily follow the same number of columns.

The XXID column contains a 2 letter prefix and doesn't always starts with the same letter. It can pretty much be any letter of the aplhabet. The length is always the same.

Finally, once these three columns are extracted, I need to sort on the XXID column and get a count on duplicates.

Joel Berger · Accepted Answer · 2012-04-09 15:28:42Z

3

I published a module called Tie::Array::CSV which lets Perl interact with your CSV as a native Perl nested array. If you use this, you can take your search logic and apply it just as if your data were already in an array of array-references. Take a look!

#!/usr/bin/env perl

use strict;
use warnings;

use File::Temp;
use Tie::Array::CSV;
use List::MoreUtils qw/first_index/;
use Data::Dumper;

# this builds a temporary file from DATA
# normally you would just make $file the filename
my $file = File::Temp->new;
print $file <DATA>;
#########

tie my @csv, 'Tie::Array::CSV', $file;

#find column from data in first row
my $colnum = first_index { /^\w.{6}$/ } @{$csv[0]};
print "Using column: $colnum\n";

#extract that column
my @column = map { $csv[$_][$colnum] } (0..$#csv);

#build a hash of repetitions
my %reps;
$reps{$_}++ for @column;

print Dumper \%reps;

edited Apr 9, 2012 at 15:28

user1228

answered Feb 17, 2012 at 5:39

Joel Berger

20.3k5 gold badges52 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

TLP · Accepted Answer · 2012-04-09 15:29:10Z

3

Here's a sample script using the Text::CSV module to parse your csv data. Consult the documentation for the module to find the proper settings for your data.

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new({ binary => 1 });

while (my $row = $csv->getline(*DATA)) {
    print "Date: $row->[2]\n";
    print "Col#1: $row->[3]\n";
    print "Col#2: $row->[4]\n";
}

edited Apr 9, 2012 at 15:29

user1228

answered Feb 15, 2012 at 22:32

TLP

68.2k10 gold badges97 silver badges156 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 11:50:48Z

You definitely want to use a CPAN library for parsing CSV, as you will never account for all the quirks of the format.

Please see: How can I parse quoted CSV in Perl with a regex?

Please see: How do I efficiently parse a CSV file in Perl?

However, here is a very naive and non-idiomatic solution for that particular string you provided:

use strict;
use warnings;

my $string = '"ID","URL","DATE","XXID","DATE-LONGFORMAT"';

my @words = ();
my $word = "";
my $quotec = '"';
my $quoted = 0;

foreach my $c (split //, $string)
{
  if ($quoted)
  {
    if ($c eq $quotec)
    {
      $quoted = 0;
      push @words, $word;
      $word = "";
    }
    else
    {
      $word .= $c;
    }
  }
  elsif ($c eq $quotec)
  {
    $quoted = 1;
  }
}

for (my $i = 0; $i < scalar @words; ++$i)
{
  print "column " . ($i + 1) . " = $words[$i]\n";
}

Collectives™ on Stack Overflow

How to extract multiple columns from a CSV file using Perl

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related