4

I need to change texts in a XML file using PHP code. Then I created a code to:

1- get the file

2- replace the texts

3- save the file with other name.

Problem is that I am having some issues to replace some text in a xml file.

I am able to replace simples strings but I can not replace text with characters like '<'. Below the real code and files.

Original XML path: http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml

1) This code just changes the text Inmuebles to xxxxxxxx. This works fine

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    'Inmuebles' => 'xxxxxxxx'
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

2) Now, if I use this code to change the text <Table Name="Inmuebles"> to <xxxxxxxx> I get a ERROR 500.

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    '<Table Name="Inmuebles">' => '<xxxxxxxx>'
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

3) In the same way, if I use this code to remove the text Publicacion I get a ERROR 500.

    $xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$xml = file_get_contents($xml_external_path);

$response = strtr($xml, array(
    '<Publicacion>' => ''
));

$newXml = $response;

$newXml = simplexml_load_string( $newXml );
$newXml->asXml('/home/csainmobiliaria/www/pisos-NEW.xml');

This is the final result I need to get:http://www.csainmobiliaria.com/imagenes/fotos/pisos-OK.xml

Capture: enter image description here

8
  • <Table Name="Inmuebles"> to <xxxxxxxx> makes the closing </Table> invalid, and the closing <xxx..> non-existent. Use the parser and do this. Also when you get a ERROR 500 check your error logs it will tell you what is wrong. If it doesn't look at the manual for error reporting functions. The <Publicacion> approach has the same issue. Don't use string functions on structured data (CSVs, JSON, XML, etc.), use the appropriate parsers. Commented Jan 25, 2019 at 12:48
  • @user3783243 I'm afraid I don't don't know what 'parsers' are. Do you mean the string int search function? Commented Jan 25, 2019 at 13:15
  • simplexml is a parser. You should bring the file as it is into that, restructure it as needed, then output it. (There are other parsers as well if you don't like that one) Commented Jan 25, 2019 at 14:30
  • Possible duplicate of How do you parse and process HTML/XML in PHP? Commented Jan 25, 2019 at 14:30
  • 1
    XSLT is a template language for just this use case - it transforms one XML into another XML, HTML or Text. PHP has an extension (ext/xsl) for it. Commented Jan 25, 2019 at 18:16

3 Answers 3

4
+150

DOMDocument allows you to copy structures of nodes, so rather than having to copy all the details individually (which can be prone to missing data when the specification changes), you can copy an entire node (such as <Inmueble>) from one document to another using importNode() which has a parameter to indicate that the full content of the element should be copied. This approach also allows you to copy any of the tables using the same function without code changes...

function extractData ( $sourceFile, $table )    {
    // Load source data
    $source = new DOMDocument();
    $source->load($sourceFile);
    $xp = new DOMXPath($source);

    // Create new data document
    $newFile = new DOMDocument();
    $newFile->formatOutput = true;
    // Create base element with the table name in new document
    $newRoot = $newFile->createElement($table);
    $newFile->appendChild($newRoot);

    // Find the records to copy
    $records = $xp->query('//Table[@Name="'.$table.'"]/*');
    foreach ( $records as $record ) {
        // Import the node to copy and append it to new document
        $newRoot->appendChild();
    }
    // Return the source of the XML
    return $newFile->saveXML();
}

echo extractData ($xml_external_path, "Inmuebles");

You could alter the method to return the document as DOMDocument or even a SimpleXML version if you wished to process it further.

For SimpleXML, change the return to...

return simplexml_import_dom($newRoot);

and then you can call it as...

$ret = extractData ($xml_external_path, "Inmuebles");
echo $ret->asXML();

Or if you just want a fixed way of doing this, you can remove the XPath and just use getElementsByTagName() to find the nodes to copy...

$source = new DOMDocument();
$source->load($xml_external_path);

$newFile = new DOMDocument();
$newRoot = $newFile->createElement("Inmuebles");
$newFile->appendChild($newRoot);

// Find the records to copy
foreach ( $source->getElementsByTagName("Inmueble") as $record ) {
    $newRoot->appendChild($newFile->importNode($record, true));
}
echo $newFile->saveXML();

To add the save file name, I've added a new parameter to the function, this new function doesn't return anything at all - it just loads the file and saves the result to the new file name...

function extractData ( $sourceFile, $table, $newFileName )    {
    // Load source data
    $source = new DOMDocument();
    $source->load($sourceFile);
    $xp = new DOMXPath($source);

    // Create new file document
    $newFile = new DOMDocument();
    $newFile->formatOutput = true;
    // Create base element with the table name in new document
    $newRoot = $newFile->createElement($table);
    $newFile->appendChild($newRoot);

    // Find the records to copy
    $records = $xp->query('//Table[@Name="'.$table.'"]/*');
    foreach ( $records as $record ) {
        // Import the node to copy and append it to new document
        $importNode = $newFile->importNode($record, true);
        // Add new content
        $importNode->appendChild($newFile->createElement("Title", "value"));
        $newRoot->appendChild();
    }

    // Update Foto elements
    $xp = new DOMXPath($newFile);
    $fotos = $xp->query("//*[starts-with(local-name(), 'Foto')]");
    foreach ( $fotos as $foto ) {
        $path = $foto->nodeValue;
        if( substr($path, 0, 5) == "/www/" )    {
            $path = substr($path,4);
        }
        // Replace node with new version
        $foto->parentNode->replaceChild($newFile->createElement("Foto1", $path), 
                  $foto);
    }  

    $newFile->save($newFileName);
}
$xml_external_path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos.xml';
$xml_external_savepath = 'saveFile.xml';

extractData ($xml_external_path, "Inmuebles", $xml_external_savepath);
Sign up to request clarification or add additional context in comments.

13 Comments

@Nige_Ren I'm trying your first code. I need to know how to save the new xml with other name.
If you mean save the XML to a file, you can just save the data using file_put_contents("outputFileName.xml", extractData ($xml_external_path, "Inmuebles"));
@Nige_Ren thanks, where exactly do I insert this line. Can you wrap the complete function?
I've added a new version of the function where you can pass the file name to save the result to.
@Nige_Ren thanks, your last code function extractData ( $sourceFile, $table, $newFileName )... works fine.
|
4

You can copy the necessary node instead of removing any excess elements. For example, you can copy Inmuebles node with help SimpleXML:

$path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$content = file_get_contents($path);
$sourceXML = new SimpleXMLElement($content);

$targetXML = new SimpleXMLElement("<Inmuebles></Inmuebles>");

$items = $sourceXML->xpath('Table[@Name=\'Inmuebles\']');
foreach ($items as $item) {
    foreach ($item->Inmueble as $inmueble) {
        $node  = $targetXML->addChild('Inmueble');
        $node->addChild('IdInmobiliariaExterna', $inmueble->IdInmobiliariaExterna);
        $node->addChild('IdPisoExterno', $inmueble->IdPisoExterno);
        $node->addChild('FechaHoraModificado', $inmueble->FechaHoraModificado);
        $node->addChild('TipoInmueble', $inmueble->TipoInmueble);
        $node->addChild('TipoOperacion', $inmueble->TipoOperacion);
    }
}

echo $targetXML->asXML()

Also, as @ThW said in comments you can use XLST, for example:

$path = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$content = file_get_contents($path);
$sourceXML = new SimpleXMLElement($content);

$xslt='<?xml version="1.0" encoding="ISO-8859-1"?>
         <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
         <xsl:output method="xml" indent="yes"/>

         <xsl:template match="Table[@Name=\'Inmuebles\']">
             <Inmuebles>
                 <xsl:copy-of select="node()"/>
             </Inmuebles>
         </xsl:template>

         <xsl:template match="Table[@Name=\'Agencias\']"/>
</xsl:stylesheet>';


$xsl = new SimpleXMLElement($xslt);

$processor = new XSLTProcessor;
$processor->importStyleSheet($xsl);
$result = $processor->transformToXML($sourceXML);
$targetXML = new SimpleXMLElement($result);
echo $targetXML->asXML();

7 Comments

The first code works great. Just one question: one of the element contains html code <br/> which is not copied (migrated) to the new xml. How can I sort this out? Thanks.
@JPashs can you attach an example of XML?
Here the url of the real xlm: csainmobiliaria.com/imagenes/fotos/pisos.xml And here a capture where you can see the html tags: postimg.cc/XrQDw9Xt After I run the code the <br/> html tab is remove from the text.
@Maxim_Fedorov did you see my last comment.
@JPashs <Descripcion>3 DORMITORIOS,1 CUARTO DE BAÑO <br></br></Descripcion> is invalid XML. Therefore SimpleXML truncate HTML tags. An element must contain HTML in <![CDATA[]]> block
|
0

Consider again, XSLT, the W3C standards compliant, special-purpose language designed to modify XML files to needed user specification such as your #1-3 needs. Like the other popular declarative language, SQL, XSLT is not limited to PHP but portable to other application layers (Java, C#, Python, Perl, R) and dedicated XSLT 1.0, 2.0, and 3.0 .exe processors.

With this approach, XSLT's recursive styling allows you to avoid any foreach looping, if logic, and repeated lines like addChild or appendChild calls at the application layer.

XSLT (save as an .xsl file, a special .xml file, or embedded string; portable to other interfaces beyond PHP)

<?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" indent="yes" encoding="ISO-8859-1"/>
     <xsl:strip-space elements="*"/>

     <!-- WALK DOWN TREE FROM ROOT -->
     <xsl:template match="Publication">
        <xsl:apply-templates select="Table"/>
     </xsl:template>

     <xsl:template match="Table[@Name='Inmuebles']">
         <Inmuebles>
             <xsl:apply-templates select="*"/>
         </Inmuebles>
     </xsl:template>

     <!-- EMPTY TEMPLATE TO REMOVE SPECIFIED NODES -->
     <xsl:template match="Table[@Name='Agencias']"/>

     <!-- RETURN ONLY FIRST FIVE NODES -->
     <xsl:template match="Table/*">
         <Inmuebles>
             <xsl:copy-of select="*[position() &lt;= 5]"/>
         </Inmuebles>
     </xsl:template>

</xsl:stylesheet>

XSLT Demo

PHP (using the php_xsl library)

// LOAD XML SOURCE
$url = 'http://www.csainmobiliaria.com/imagenes/fotos/pisos-NOK.xml';
$web_data = file_get_contents($url);
$xml = new SimpleXMLElement($web_data);

// LOAD XSL SCRIPT
$xsl = simplexml_load_file('/path/to/script.xsl');

// XSLT TRANSFORMATION
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); 
$newXML = $proc->transformToXML($xml);

// OUTPUT TO CONSOLE
echo $newXML;

// SAVE TO FILE
file_put_contents('Output.xml', $newXML);

And as the great XSLT guru, @Dimitre Novatchev, usually ends his posts: the wanted, correct result is produced:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Inmuebles>
   <Inmuebles>
      <IdInmobiliariaExterna>B45695855</IdInmobiliariaExterna>
      <IdPisoExterno>100002</IdPisoExterno>
      <FechaHoraModificado>30/11/2018</FechaHoraModificado>
      <TipoInmueble>PISO</TipoInmueble>
      <TipoOperacion>3</TipoOperacion>
   </Inmuebles>
   <Inmuebles>
      <IdInmobiliariaExterna>B45695855</IdInmobiliariaExterna>
      <IdPisoExterno>100003</IdPisoExterno>
      <FechaHoraModificado>30/11/2018</FechaHoraModificado>
      <TipoInmueble>CHALET</TipoInmueble>
      <TipoOperacion>4</TipoOperacion>
   </Inmuebles>
</Inmuebles>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.