I am trying to parse an XML file with the following contents:
<?xml version="1.0" encoding="UTF-8"?>
<sentences>
<lastmodified>none</lastmodified>
<sentencedefs xml:lang="common">
</sentencedefs>
<sentencedefs xml:lang="en-US">
<baselanguage xml:lang="en-US"/>
</sentencedefs>
</sentences>
The perl code which I use to parse this looks like this (actually this is a cut down version of the key portion of the code):
use 5.006_001;
use strict;
use warnings;
use English '-no_match_vars';
use XML::LibXML;
my $SENTENCEDEFS = "sentencedefs";
my $LANG = "lang";
my $lParser = XML::LibXML->new;
my $lSentencesDoc = $lParser->parse_file("sentences.xml");
my $lSentencesRoot = $lSentencesDoc->documentElement();
my @lSentenceDefs = $lSentencesRoot->getElementsByTagName($SENTENCEDEFS);
foreach my $lDefs (@lSentenceDefs)
{
my @lAttrs = $lDefs->attributes();
foreach my $lAttr (@lAttrs)
{
print("Attr: " . $lAttr->toString(1) . "\n");
}
my $lLang = $lDefs->getAttribute($LANG);
my $lFound = defined($lLang);
print("Found $LANG? $lFound \n");
}
I have previously been using LibXML V1.58. I am now testing against LibXML V1.70 and have found that the output is different:
V1.58:
Attr: xml:lang="common"
Found lang? 1
Attr: xml:lang="en-US"
Found lang? 1
V1.70:
Attr: xml:lang="common"
Found lang?
Attr: xml:lang="en-US"
Found lang?
V1.70 only finds the attribute if I use $LANG="xml:lang".
Can anyone explain why LibXML V1.70 is handling my XML differently? Is there a change I can make to my code to make it behave the same when running with both V1.58 and V1.70? I can't change the XML document.