2

I am trying to parse a CSV file in Perl and paste the information of some columns into an XML-file. I've never done anything in Perl, and my idea was to store the data into an array and then pull the information out of the array as I build it.

I'm sure I am doing several things wrong, since I am not getting the value I am expecting but instead what looks like the array addresses in the memory (here is an example: ARRAY(0x35e9360).

Could somebody help me out and point me to a better solution?

Here is the code in question:

use Text::CSV;
use utf8;
use XML::Simple qw(XMLout);
use XML::Twig;
use File::Slurp;
use Encode;

&buildXML();

my $csv = Text::CSV->new( { binary => 1 } )    # should set binary attribute.
        or die "Cannot use CSV: " . Text::CSV->error_diag();

$csv = Text::CSV->new( { sep_char => '|' } );
$csv = Text::CSV_XS->new( { allow_loose_quotes => 1 } );

my $t = XML::Twig->new( pretty_print => indented );
$t->parsefile('output.xml');

$out_file = "output.xml";
open( my $fh_out, '>>', $out_file ) or die "unable to open $out_file for writing: $!";

my $root = $t->root;                           #get the root

open my $fh, "<:encoding(utf8)", "b.txt" or die "text.txt: $!";

while ( my $row = $csv->getline($fh) ) {

    my @rows = $row;

    $builds = $root->first_child();            # get the builds node
    $xcr    = $builds->first_child();          #get the xcr node

    my $xcrCopy = $xcr->copy();                #copy the xcr node
    $xcrCopy->paste( after, $xcr );            #paste the xcr node

    $xcr->set_att( id => "@rows[0]" );
    print {$fh_out} $t->sprint();
}

$csv->eof or $csv->error_diag();

Here is a testfile:

ID|Name|Pos
1|a|265
2|b|950
3|c|23
4|d|798
5|e|826
6|f|935
7|g|852
8|h|236
9|i|642

Here is the XML that is build by the buildXML() sub.

<?xml version='1.0' standalone='yes'?>
<project>
  <builds>
    <xcr id="" name="" pos="" />          
  </builds>
</project>
4
  • 1
    The getline method returns an arrayref -- just as you say, you are seeing a reference to an array. You need my @rows = @$row, to dereference that into an array. As for the rest, can you post a file (b.txt) that I can test with? Commented Jun 28, 2016 at 7:21
  • 1
    Will also need output.xml, unless you mean to write it from scratch (but then the code for it is wrong). Commented Jun 28, 2016 at 8:03
  • You've really written far too much code before you started testing, ending up with a lot of confused ideas and trial code that obfuscate the overall program. You should start by writing just a few lines that extract the values from the CSV and no more. Also, always use strict and use warnings 'all' at the top of every Perl program. Don't use an ampersand & when calling subroutines: just buildXML() is correct, and whatever resource you are using to learn Perl is very out of date if it tells you otherwise. Commented Jun 28, 2016 at 8:31
  • @zdim I am building it from scratch. There is a sub buildXML. It builds a very simple structure. I will add it to my original question. Commented Jun 28, 2016 at 8:37

2 Answers 2

3

This program appears to do as you require

Links:

After reverse-engineering your code to discover your what you're aiming for, I find that it's really a fairly simply problem. It would have helped a lot if you had explained your intention in terms of adding a new xcr element for each line in the CSV file, with attributes corresponding to the columns

It's likely that you don't need the XML template file at all, or perhaps just the template xcr element with empty attributes is superfluous? I also wonder if you want to skip the header line from the CSV file? These changes are trivial, but I have left the code in the simplest state possible

use utf8;
use strict;
use warnings 'all';
use autodie;

use Text::CSV;
use XML::Twig;
use Encode;

use constant XML_FILE => 'output.xml';
use constant CSV_FILE => 'b.txt';

build_xml(XML_FILE);

my $csv = Text::CSV->new( {
    sep_char           => '|',
    binary             => 1,
    allow_loose_quotes => 1,   # This is brought forward. Probably unnecessary
} );

my $t = XML::Twig->new(
    pretty_print => 'indented',
);

$t->parsefile(XML_FILE);
my ($xcr) = $t->findnodes('/project/builds/xcr');

open my $fh, '<:encoding(utf8)', CSV_FILE;

while ( my $row = $csv->getline($fh) ) {

    my ($id, $name, $pos) = @$row;

    my $xcr_copy = $xcr->copy;
    $xcr_copy->set_att( id => $id, name => $name, pos => $pos );
    $xcr_copy->paste( last_child => $xcr->parent );
}

$t->print;


sub build_xml {

    open my $fh, '>', shift;

    print $fh <<__END_XML__;
<?xml version='1.0' standalone='yes'?>
<project>
  <builds>
    <xcr id="" name="" pos="" />          
  </builds>
</project>
__END_XML__

}

output

<?xml version="1.0" standalone="yes"?>
<project>
  <builds>
    <xcr id="" name="" pos=""/>
    <xcr id="ID" name="Name" pos="Pos"/>
    <xcr id="1" name="a" pos="265"/>
    <xcr id="2" name="b" pos="950"/>
    <xcr id="3" name="c" pos="23"/>
    <xcr id="4" name="d" pos="798"/>
    <xcr id="5" name="e" pos="826"/>
    <xcr id="6" name="f" pos="935"/>
    <xcr id="7" name="g" pos="852"/>
    <xcr id="8" name="h" pos="236"/>
    <xcr id="9" name="i" pos="642"/>
  </builds>
</project>



After reading your comment (stuff like this should be edited into the question) saying "I am building [the XML data] from scratch. There is a sub buildXML" I think this is more likely to be what you require. With XML::Twig it is simplest to parse some XML text instead of creating and linking individual XML::Twig::Elt objects

The $t object starts with no xcr objects at all. They are all created through XML::Twig::Elt->new and pasted as the last_child of the builds element

require v5.14.1;  # For autodie

use utf8;
use strict;
use warnings 'all';
use autodie;

use Text::CSV;
use XML::Twig;
use Encode;

use constant XML_FILE => 'output.xml';
use constant CSV_FILE => 'b.txt';

my $t = XML::Twig->new(
    pretty_print => 'indented',
);

$t->parse(<<END_XML);
<project>
  <builds/>
</project>
END_XML

my ($builds) = $t->findnodes('/project/builds');


my $csv = Text::CSV->new( {
    sep_char => '|',
    binary => 1,
    allow_loose_quotes => 1,
} );

{
    open my $fh, '<:encoding(utf8)', CSV_FILE;
    <$fh>; # Drop the header line

    while ( my $row = $csv->getline($fh) ) {

        my ($id, $name, $pos) = @$row;

        my $xcr = XML::Twig::Elt->new(xcr => {
            id   => $id,
            name => $name,
            pos  => $pos
        });

        $xcr->paste( last_child => $builds );
    }
}

open my $fh, '>encoding(utf-8)', XML_FILE;
$t->set_output_encoding('UTF-8');
$t->print($fh, 'indented');

output

<?xml version="1.0" encoding="UTF-8"?><project>
  <builds>
    <xcr id="1" name="a" pos="265"/>
    <xcr id="2" name="b" pos="950"/>
    <xcr id="3" name="c" pos="23"/>
    <xcr id="4" name="d" pos="798"/>
    <xcr id="5" name="e" pos="826"/>
    <xcr id="6" name="f" pos="935"/>
    <xcr id="7" name="g" pos="852"/>
    <xcr id="8" name="h" pos="236"/>
    <xcr id="9" name="i" pos="642"/>
  </builds>
</project>
Sign up to request clarification or add additional context in comments.

Comments

1

The getline method of Text::CSV returns an arrayref

It reads a row from the IO object $io using $io->getline () and parses this row into an array ref.

The ARRAY(0x35e9360) is indeed what you get when you print out array reference. This is usual, many parsers normally return a reference to an array for a row. So you need to dereference that, generally by @{$arrayref}, but in this case there is no ambiguity and one can drop the curlies, @$arrayref.

use warnings;
use strict;
use Text::CSV_XS;
use XML::Twig;

my $csv = Text::CSV_XS->new (
    { binary => 1, sep_char => '|',  allow_loose_quotes => 1 }
) or die "Cannot use CSV: " . Text::CSV->error_diag();

my $t = XML::Twig->new(pretty_print => 'indented');
$t->parsefile('output.xml');
my $out_file = 'output.xml';
open my $fh_out, '>>', $out_file  or die "Can't open $out_file for append: $!";
my $root = $t->root;

my $file = 'b.txt';
open my $fh, "<:encoding(UTF-8)", $file  or die "Can't open $file: $!";

while (my $rowref = $csv->getline($fh)) {
    #my @cols = @$rowref;
    #print "@cols\n";

    my $builds = $root->first_child();  # get the builds node
    my $xcr = $builds->first_child();   # get the xcr node
    my $xcrCopy = $xcr->copy();         # copy the xcr node
    $xcrCopy->paste('after', $xcr);     # paste the xcr node
    $xcr->set_att(id => $rowref->[0]);  # or $cols[0];

    print $fh_out $t->sprint();
}

This prints (when @cols and its print are uncommented) for the CSV file

ID Name Pos
1 a 265
2 b 950
...

So we've read the file OK.

The XML processing is copied from the question, except for the part that uses the CSV value. We take the first element of the current row, which is $rowref->[0] since $rowref is a reference. (Or use an element from the dereferenced array, $cols[0].)

I don't know what output is expected but it is built out of the template and seems OK for this code.


Note. A single element of an array is a scalar, thus it bears a $ -- so, $cols[0]. If you were to extract multiple columns you could use an array slice, in which case the result is an array so it needs the @, for example @cols[0,2] is an array with the first and third element. This can then be assigned to a list, for example my ($c1, $c3) = @cols[0,2];.

6 Comments

Thank you so far, but i am now getting the whole rows, instead of the one value i wanted to get.
@UsefulUserName I've added to the code, to get to all elements for the CSV and print them out, so you can see how it's packed. As for XML, it's be nice to have output.xml.
@UsefulUserName Thank you for the xml template. I've added the XML part to the code, now it's processing the whole thing. It does build the output.xml, but I don't know whether it is exactly what you want (looks reasonable).
@UsefulUserName I am certain I posted a comment to you here but now it's gone (?). Sorry if this repeats -- the code above has been cleaned up considerably. For one thing, your code calls new twice and this shouldn't be done. This now is far simpler, as it should be. As for XML, it only does exactly what you have already in the question.
This works fine for me and I am using this solutions since it leaves me some options that I need for other things I am working on.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.