Unable to parse csv file using Text::CSV

Question

I've the csv file in this format:

"Keyword"   "Competition"   "Global Monthly Searches"   "Local Monthly Searches (United States)"    "Approximate CPC (Search) - INR"

"kasperaky support" -0  -0  -0  -0

The first line is the column titles.

I've tried most of options in Text::CSV but I'm not able to extract the fields.

Here sep_char=>' '

The nearest I could go is to get the first word of the first column("kasperaky" only).

I'm creating the object this way(while trying various settings):

my $csv = Text::CSV->new ( { 
    binary => 1 ,
    sep_char=>' ',allow_loose_quotes=>0,quote_space=>0,quote_char          => '"',
    ,allow_whitespace    =>0, eol=>"\015\012"
     } ) 
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();

For what it's worth, I tried this and was able to get each field to be parsed fully (e.g. "kasperaky support"), although every individual space outside quote marks was treated as a delimiter--thus I ended up with a lot of empty strings for fields. If you really are dealing with a file that has an unpredictable number of spaces between fields, you may have to massage the input by reducing all strings of spaces to one space before feeding it to Text::CSV. (I used Perl 5.12.4 and Text::CSV 1.21.) — L2G
– L2G, Commented Jun 5, 2012 at 16:55
I am afraid that pasting the file content here in a textarea mangled it. Upload the file somewhere so we can have a close look at the unchanged content, or provide a hexdump. — daxim
– daxim, Commented Jun 6, 2012 at 7:43
@daxim here is the file: docs.google.com/open?id=0B7aEugGV1GwTNk84bXlPSkM3dzQ — AgA
– AgA, Commented Jun 9, 2012 at 8:13

daxim · Accepted Answer · 2012-06-09 14:23:07Z

5

Your CSV is tab-separated. Use the following (code is tested to work against your example file):

use strictures;
use autodie qw(:all);       # automatic error checking open/close
use charnames qw(:full);    # \N named characters
use Text::CSV qw();
my $csv = Text::CSV->new({
    auto_diag   => 2,       # automatic error checking CSV methods
    binary      => 1,
    eol         => "\N{CR}\N{LF}",
    sep_char    => "\N{TAB}",
}) or die 'Cannot use CSV: ' . Text::CSV->error_diag;

open my $fh, '<:encoding(ASCII)', 'computer crash.csv';
while (my $row = $csv->getline($fh)) {
    ...
}
close $fh;

edited Jun 9, 2012 at 14:23

daxim

39.3k4 gold badges71 silver badges135 bronze badges

answered Jun 5, 2012 at 16:43

Zaid

37.3k16 gold badges89 silver badges159 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

AgA Over a year ago

Thanks @Zaid for pointing it out("\t"). It works but it reads only the first line. It's not moving to second line.

L2G Over a year ago

@AgA Do you really have an empty line between the column names and the first line of data? If so, I would leave that out.

AgA Over a year ago

@L2G I placed link of my input file above. This file is as downloaded from Google Keyword Tool. I've managed to do without Text:CSV which is only a "temporary fix" but I want to use Text::CSV for compatibility purpose.

ikegami · Accepted Answer · 2012-06-05 17:07:32Z

4

To call that a CSV file is a bit of stretch! Your separator isn't a space, it's a sequence of 1 or more spaces, and Text::CSV doesn't handle that. (allow_whitespace doesn't work when your separator is a space, unfortunately.) You could use something like:

use List::MoreUtils qw( apply );
my @fields = apply { s/\\(.)/$1/sg } $line =~ /"((?:[^"\\]|\\.)*)"/sg;

Now, if those are tabs, that's a different story, and you could use sep_char => "\t".

edited Jun 5, 2012 at 17:07

answered Jun 5, 2012 at 16:46

ikegami

391k17 gold badges291 silver badges555 bronze badges

1 Comment

AgA Over a year ago

Sorry it does have tab as separator which @zaid pointed me out.

Joel Berger · Accepted Answer · 2012-06-09 16:18:08Z

I always recommend using a parser, and usually Text::CSV is great, but when you are not working with real CSV sometimes it can be a pain. You might try using the core module Text::ParseWords in this case.

Here is my example.

#!/usr/bin/env perl

use strict;
use warnings;

use Text::ParseWords qw/parse_line/;

my @data;
while( my $line = <DATA> ) {
  chomp $line;
  my @words = parse_line( qr/\s+/, 0, $line );
  next unless @words;
  push @data, \@words;
}

use Data::Dumper;
print Dumper \@data;

__DATA__

"Keyword"   "Competition"   "Global Monthly Searches"   "Local Monthly Searches (United States)"    "Approximate CPC (Search) - INR"

"kasperaky support" -0  -0  -0  -0

This implementation builds up a 2D array of your data, skipping unused lines. Of course you can build whatever data structure you want once you have parsed the tokens.

$VAR1 = [
          [
            'Keyword',
            'Competition',
            'Global Monthly Searches',
            'Local Monthly Searches (United States)',
            'Approximate CPC (Search) - INR'
          ],
          [
            'kasperaky support',
            '-0',
            '-0',
            '-0',
            '-0'
          ]
        ];

avrono · Accepted Answer · 2016-04-20 16:08:13Z

0

This worked for me with a file space seperated with 1 or more spaces This is a case where Text::CSV does not do the job ...

open(my $data, '<:encoding(UTF-8)', $filename) or die "Cannot open $filename";

while( my $line = <$data> ) {
        my @fields = split(' ', $line);
        print "\n$line : $fields[0] --- $fields[1] ----- $fields[2]";

}

answered Apr 20, 2016 at 16:08

avrono

1,6803 gold badges20 silver badges41 bronze badges

Collectives™ on Stack Overflow

Unable to parse csv file using Text::CSV

4 Answers 4

3 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related