1

I am trying to parse XML using XML::LibXML module. The XML data structure has node called <row> which encloses two child nodes <key> and <value>. I want to parse each of these <row> and build a hash data structure. I could come up with below code to achieve it but I feel there would be a better way to do it.

use strict;
use warnings;

use Data::Dumper;
use XML::LibXML;

my $XML=<<EOF;
<config>
    <row>
        <key>
            <A1>alpha</A1>
            <A2>beta</A2>
            <A3>cat</A3>
            <A4>delta</A4>
        </key>
        <value>
            <B1>eclipse</B1>
            <B2>pico</B2>
            <B3>penta</B3>
            <B4>zeta</B4>
        </value>
    </row>
    <row>
        <key>
            <A1>tom</A1>
            <A2>harry</A2>
            <A3>bob</A3>
            <A4>ben</A4>
        </key>
        <value>
            <B1>TAP</B1>
            <B2>MAN</B2>
            <B3>WORK</B3>
            <B4>MAINTAIN</B4>
        </value>
    </row>
</config>
EOF

my $parser = XML::LibXML->new();
my $doc  = $parser->parse_string($XML);

my %hash;
my $i = 1;

foreach my $node ($doc->findnodes('/config/row/key')) {
    foreach my $tag ('A1', 'A2','A3','A4') {
        $hash{'KEY' . $i}{$tag} = $node->findvalue( $tag );
    }
    $i++;
}

$i = 1;

foreach my $node ($doc->findnodes('/config/row/value')) {
    foreach my $tag ('B1', 'B2','B3','B4') {
        $hash{'KEY' . $i}{$tag} = $node->findvalue( $tag );
    }
    $i++;
}

print Dumper \%hash;

Output

$VAR1 = {
          'KEY2' => {
                      'A3' => 'bob',
                      'B3' => 'WORK',
                      'B1' => 'TAP',
                      'A1' => 'tom',
                      'B4' => 'MAINTAIN',
                      'B2' => 'MAN',
                      'A2' => 'harry',
                      'A4' => 'ben'
                    },
          'KEY1' => {
                      'A3' => 'cat',
                      'B3' => 'penta',
                      'B1' => 'eclipse',
                      'A1' => 'alpha',
                      'B4' => 'zeta',
                      'B2' => 'pico',
                      'A2' => 'beta',
                      'A4' => 'delta'
                    }
        };

Actually, instead of creating imaginary keys ( KEY1 , KEY2 .. ) , I would like to have <A1> node's value to be considered as key for each section. Can someone please help me out here.

Desired output:

'tom'   => {
             'A3' => 'bob',
             'B3' => 'WORK',
             'B1' => 'TAP',

             'B4' => 'MAINTAIN',
             'B2' => 'MAN',
             'A2' => 'harry',
             'A4' => 'ben'
           },
'alpha' => {
             'A3' => 'cat',
             'B3' => 'penta',
             'B1' => 'eclipse',

             'B4' => 'zeta',
             'B2' => 'pico',
             'A2' => 'beta',
             'A4' => 'delta'
           }

2 Answers 2

2

"I would like to have <A1> node's value to be considered as key for each section"

This solution creates a hash for each row element and pushes it onto the @rows array. Unlike the original it reads the XML data from a file called config.xml

The tags for the A* and B* elements are ignored -- it is simply assumed that the keys and values are in the same order

The main loop iterates over the row elements, and for each row, a list of the key and value child elements is converted to their text values with a map. Then a hash is built and pushed onto the array

I've used Data::Dump to display the resulting data structure as I believe it is far superior to Data::Dumper

use strict;
use warnings;

use XML::LibXML;

my $doc = XML::LibXML->load_xml( location => 'config.xml' );

my @rows;

for my $row ($doc->findnodes('/config/row')) {

    my @keys   = map $_->textContent, $row->findnodes('key/*');
    my @values = map $_->textContent, $row->findnodes('value/*');

    my %row;
    @row{@keys} = @values;
    push @rows, \%row;
}

use Data::Dump;
dd \@rows;

output

[
  { alpha => "eclipse", beta => "pico", cat => "penta", delta => "zeta" },
  { ben => "MAINTAIN", bob => "WORK", harry => "MAN", tom => "TAP" },
]

Update

Here's a variation that complies with your desired output. Thanks to choroba for pointing it out to me

It's a very similar approach to my original one above, but it builds a hash instead of an array and uses the elements' tag names as keys instead of the key/value relationship that I guessed you would want

I should say that I'm very doubtful about your choice of data structure; for instance, I see no need to exclude the A1 key from the subsidiary hash just because its value is used to identify the row. I would also be surprised if it wouldn't be better to use the key and value strings as keys and values. But it may also be that the XML tag names are badly chosen and your choice is optimal, and I have no way of knowing

Here's the Perl code. which reads from the config.xml file as before. If you would prefer to keep the A1 hash element as I described then you can just change the elsif to an if and it will happen

use strict;
use warnings;

use XML::LibXML;

my $doc = XML::LibXML->load_xml( location => 'config.xml' );

my ( %data, $section);

for my $row ( $doc->findnodes('/config/row') ) {

    for my $item ( $row->findnodes('key/* | value/*') ) {

        my ($key, $val) = ( $item->tagName, $item->textContent );

        if ( defined $section ) {
            $data{$section}{$key} = $val
        }
        else {
            $section = $val;
        }
    }
}

use Data::Dump;
dd \%data;

output

{
  alpha => {
    A2 => "beta",
    A3 => "cat",
    A4 => "delta",
    B1 => "eclipse",
    B2 => "pico",
    B3 => "penta",
    B4 => "zeta",
  },
  tom => {
    A2 => "harry",
    A3 => "bob",
    A4 => "ben",
    B1 => "TAP",
    B2 => "MAN",
    B3 => "WORK",
    B4 => "MAINTAIN",
  },
}
Sign up to request clarification or add additional context in comments.

Comments

1

The first XPath expression selects the A1s, the second one selects all the A* and B* in the same row (except the A1 itself).

#! /usr/bin/perl
use warnings;
use strict;

use XML::LibXML;

my $xmlstring = << '__XML__';
<config>
    ...
</config>
__XML__

my $xml = 'XML::LibXML'->load_xml(string => $xmlstring);
my $root = $xml->documentElement;

my %hash;
for my $a1 ($root->findnodes('/config/row/key/A1')) {
    for my $node ($a1->findnodes('(../../key/*[not(self::A1)] | ../../value/*)')) {
        $hash{ $a1->textContent }{ $node->getName } = $node->textContent;
    }
}

use Data::Dump;
dd \%hash;

output

{
  alpha => {
    A2 => "beta",
    A3 => "cat",
    A4 => "delta",
    B1 => "eclipse",
    B2 => "pico",
    B3 => "penta",
    B4 => "zeta",
  },
  tom => {
    A2 => "harry",
    A3 => "bob",
    A4 => "ben",
    B1 => "TAP",
    B2 => "MAN",
    B3 => "WORK",
    B4 => "MAINTAIN",
  },
}

10 Comments

Why the quotes in 'XML::LibXML'->load_xml ? I'm sure you know that they're optional n a class method call. I also think you need some narrative
@Borodin: regarding the quotes, see stackoverflow.com/a/16656174/1030675. They are nices than XML::LibXML::, but a bit less powerful.
@Borodin: The code was tested, please don't change it in a way that changes its output.
@Borodin: Here's where I learned to quote class names: perlmonks.org/?node_id=980498
I apologise for my edit, but after all your original solution has no output. I've added just a dump and the corresponding output. I hope you can see what I meant by my original changes?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.