There are multiple threads about converting XML to JSON in PHP and I do already have the following code that's working pretty well:
function jsonPrepareXml(object $domNode): void
{
foreach ($domNode->childNodes as $node) {
if ($node->hasChildNodes()) {
jsonPrepareXml($node);
} else {
if ($domNode->hasAttributes() && strlen($domNode->nodeValue) !== 0) {
$domNode->setAttribute("nodeValue", $node->textContent);
$node->nodeValue = "";
}
}
}
}
$dom = new \DOMDocument();
$dom->loadXML(FileHelpers::fileGetContents($file), LIBXML_NOCDATA);
jsonPrepareXml($dom);
$xmlData = $dom->saveXML();
$sxml = \simplexml_load_string($xmlData);
$json = \json_decode(
\json_encode($sxml, JSON_THROW_ON_ERROR),
null,
512,
JSON_THROW_ON_ERROR
);
Now I encountered the issue that in some XML-Files Text that is in CData sections is truncated in some cases. I was not able to find what those files have in common. It was not always the same amount of chars. And if I copied only the CData section to an empty XML for debugging the whole data was read.
So I thought I would remove the LIBXML_NOCDATA constant as libxml reads the whole text when parsing as cdata. But then the conversion to JSON fails as cdata is not converted.
So I thought I would convert cdata nodes to normal text-node like this in the jsonPrepareXml() function
elseif ($node instanceof \DOMCdataSection) {
$node = new \DOMText((string) $node->nodeValue);
}
But this does not change anything.
Are there any ideas on how to fix this issue? Of course, it would be great if the original function would work, but I was not able to fix this. Even with different PHP versions or libxml versions. So I gave up on this. Currently, I'm on PHP 8.0.11.
Update: So far I was not able to publish an xml file that triggered the error as the files contained a lot of personal data. But now I do have one xml file that shows the error quite nicely: https://drive.google.com/file/d/10iyiH1O6oKG9Zbv91He1_KlCQlhdeZoO/view?usp=sharing If I load the file with the following code, it ends with 'Majapahit Empire, the city' at day 4.
<?php declare(strict_types=1);
$dom = new \DOMDocument();
$dom->loadXML(FileHelpers::fileGetContents($file), LIBXML_NOCDATA);
header("Content-type: text/plain");
echo $dom->saveXML();
So this is event with my function to prepare the attributes for the json conversion. As stated, I can remove LIBXML_NOCDATA but then I get empty nodes when converting to json.
So I would be looking for a fix or at least a workaround that would convert all the cdata notes into normal text-nodes.
The main issue really are the cdata nodes and not the jsonPrepareXml function. I just wanted to use that function for the workaround.