2

I've the csv file in this format:

"Keyword"   "Competition"   "Global Monthly Searches"   "Local Monthly Searches (United States)"    "Approximate CPC (Search) - INR"

"kasperaky support" -0  -0  -0  -0

The first line is the column titles.

I've tried most of options in Text::CSV but I'm not able to extract the fields.

Here sep_char=>' '

The nearest I could go is to get the first word of the first column("kasperaky" only).

I'm creating the object this way(while trying various settings):

my $csv = Text::CSV->new ( { 
    binary => 1 ,
    sep_char=>' ',allow_loose_quotes=>0,quote_space=>0,quote_char          => '"',
    ,allow_whitespace    =>0, eol=>"\015\012"
     } ) 
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();
5
  • For what it's worth, I tried this and was able to get each field to be parsed fully (e.g. "kasperaky support"), although every individual space outside quote marks was treated as a delimiter--thus I ended up with a lot of empty strings for fields. If you really are dealing with a file that has an unpredictable number of spaces between fields, you may have to massage the input by reducing all strings of spaces to one space before feeding it to Text::CSV. (I used Perl 5.12.4 and Text::CSV 1.21.) Commented Jun 5, 2012 at 16:55
  • @L2G what arguments did you use? Commented Jun 5, 2012 at 17:41
  • 1
    I am afraid that pasting the file content here in a textarea mangled it. Upload the file somewhere so we can have a close look at the unchanged content, or provide a hexdump. Commented Jun 6, 2012 at 7:43
  • @AgA: The same arguments as you gave. Commented Jun 8, 2012 at 18:28
  • @daxim here is the file: docs.google.com/open?id=0B7aEugGV1GwTNk84bXlPSkM3dzQ Commented Jun 9, 2012 at 8:13

4 Answers 4

5

Your CSV is tab-separated. Use the following (code is tested to work against your example file):

use strictures;
use autodie qw(:all);       # automatic error checking open/close
use charnames qw(:full);    # \N named characters
use Text::CSV qw();
my $csv = Text::CSV->new({
    auto_diag   => 2,       # automatic error checking CSV methods
    binary      => 1,
    eol         => "\N{CR}\N{LF}",
    sep_char    => "\N{TAB}",
}) or die 'Cannot use CSV: ' . Text::CSV->error_diag;

open my $fh, '<:encoding(ASCII)', 'computer crash.csv';
while (my $row = $csv->getline($fh)) {
    ...
}
close $fh;
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @Zaid for pointing it out("\t"). It works but it reads only the first line. It's not moving to second line.
@AgA Do you really have an empty line between the column names and the first line of data? If so, I would leave that out.
@L2G I placed link of my input file above. This file is as downloaded from Google Keyword Tool. I've managed to do without Text:CSV which is only a "temporary fix" but I want to use Text::CSV for compatibility purpose.
4

To call that a CSV file is a bit of stretch! Your separator isn't a space, it's a sequence of 1 or more spaces, and Text::CSV doesn't handle that. (allow_whitespace doesn't work when your separator is a space, unfortunately.) You could use something like:

use List::MoreUtils qw( apply );
my @fields = apply { s/\\(.)/$1/sg } $line =~ /"((?:[^"\\]|\\.)*)"/sg;

Now, if those are tabs, that's a different story, and you could use sep_char => "\t".

1 Comment

Sorry it does have tab as separator which @zaid pointed me out.
1

I always recommend using a parser, and usually Text::CSV is great, but when you are not working with real CSV sometimes it can be a pain. You might try using the core module Text::ParseWords in this case.

Here is my example.

#!/usr/bin/env perl

use strict;
use warnings;

use Text::ParseWords qw/parse_line/;

my @data;
while( my $line = <DATA> ) {
  chomp $line;
  my @words = parse_line( qr/\s+/, 0, $line );
  next unless @words;
  push @data, \@words;
}

use Data::Dumper;
print Dumper \@data;

__DATA__

"Keyword"   "Competition"   "Global Monthly Searches"   "Local Monthly Searches (United States)"    "Approximate CPC (Search) - INR"

"kasperaky support" -0  -0  -0  -0

This implementation builds up a 2D array of your data, skipping unused lines. Of course you can build whatever data structure you want once you have parsed the tokens.

$VAR1 = [
          [
            'Keyword',
            'Competition',
            'Global Monthly Searches',
            'Local Monthly Searches (United States)',
            'Approximate CPC (Search) - INR'
          ],
          [
            'kasperaky support',
            '-0',
            '-0',
            '-0',
            '-0'
          ]
        ];

Comments

0

This worked for me with a file space seperated with 1 or more spaces This is a case where Text::CSV does not do the job ...

open(my $data, '<:encoding(UTF-8)', $filename) or die "Cannot open $filename";

while( my $line = <$data> ) {
        my @fields = split(' ', $line);
        print "\n$line : $fields[0] --- $fields[1] ----- $fields[2]";

}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.