0

I have a CSV file like this:

id,item,itemtype,date,service,level,message,action,user
"344","-1","IRM","2008-08-22 13:01:57","login","1","Failed login: \'irm\', database \'irmD\'",NULL,NULL
"346","-1","IRM","2008-08-27 10:58:59","login","1","Ошибка входа:\'\', база данных \'irmD\'",NULL,NULL

It's Okay with the second line, but Text::CSV just skips the third one. The third line consists Cyrillic characters, but the file is encoded in UTF-8 and Perl shouldn't have any problems with that.

And the code:

#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
use utf8;

my $file = 'Test.csv'; my $csv = Text::CSV->new();
open (CSV, "<", $file) or die $!;
while (<CSV>) {
    if ($csv->parse($_)) {
        if ($. == 1) {
            next;
        }
        my @columns = $csv->fields();
        my $id=$columns[0];
        print $id." ";
    }
}
print "\n";
close CSV;

Any help or hint will be appreciated.

3
  • It wouldn't be the first library that just throws up on parsing UTF-8. Commented Dec 18, 2014 at 10:03
  • I parsed JSON files full to the brim of UTF-8, but it wasn't a problem. Commented Dec 18, 2014 at 10:06
  • Different library though. Looks like there's options in Text::CSV for UTF-8 processing. Commented Dec 18, 2014 at 10:10

2 Answers 2

3

Did you read the documentation of Text::CSV?

If your data contains newlines embedded in fields, or characters above 0x7e (tilde), or binary data, you must set "binary => 1"

Also, use utf8 tells Perl you're going to use UTF-8 in the source code, not in the data. Remove it.

Using <> to read in CSV is also mentioned in the documentation:

while (<>) {           #  WRONG!

Here is a working version:

#!/usr/bin/perl
use warnings;
use strict;

use Text::CSV;

my $file = 'Test.csv';
my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;
open my $CSV, '<', $file or die $!;
while (my $line = $csv->getline($CSV)) {
    next if 1 == $.;

    my @columns = @$line;
    my $id = $columns[0];
    print $id . " ";
}
print "\n";
close $CSV;
Sign up to request clarification or add additional context in comments.

Comments

0

I think your problem will be, that whilst you've useed UTF8, that's only really for perl's uses. From: http://perldoc.perl.org/utf8.html

utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code

Looking at Text::CSV

You probably want:

$csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });

You will also - probably - need to specify that you're opening a UTF-8 file. You can either do this as part of the open or with binmode

open ( my $filehandle, "<:encoding(UTF-8)", "Test.csv" );

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.