0

I have an XML document like this, it's > 400 MB file.

My issue is that I cannot get XMLReader to not run into memory limit, have a 512 Mb PHP 7.2 server.

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetModifiedResponse xmlns="http://host.com">
<ProductList>
<UpdatedProducts>
  <ProductId>1</ProductId>
  <ProductId>2</ProductId>
  <ProductId>3</ProductId>
  <ProductId>4</ProductId>
</UpdatedProducts>
<RemovedProducts>
  <ProductId>5</ProductId>
  <ProductId>6</ProductId>
  <ProductId>7</ProductId>
  <ProductId>8</ProductId>
</RemovedProducts>
</ProductList>
..

This is kinda my script, and the issue here is that the whole "UpdatedProducts" is loaded and max the ram out. And need a similar for RemovedProducts, both need to be in the loop, how to solve the problem - if possible w.o. put more ram in the server (or memory_limit(-1))?

    while ($xml->name == 'UpdatedProducts') {
      $elm = new \SimpleXMLElement($xml->readOuterXml());

      foreach ($elm->ProductId as $product) {
        $this->saveToDb($product);
      }

      $xml->next('UpdatedProducts');
    }

Update:

the code is right now

$xml = new \XMLReader();
    $xml->open(__DIR__ . '/../../var/tmp/out.xml');

    while ($xml->read()) {
      while ($xml->name == 'UpdatedProducts') {
      while ($xml->read() && $xml->name != 'ProductId');
        while ($xml->name == 'ProductId') {
          $this->saveToDb($xml->readInnerXml(), 'update');
          $xml->next('ProductId');
        }
        $xml->next('UpdatedProducts');
      }
      while ($xml->name == 'RemovedProducts') {
        while ($xml->read() && $xml->name != 'ProductId');
        while ($xml->name == 'ProductId') {
          $this->saveToDb($xml->readInnerXml(), 'remove');
          $xml->next('ProductId');
        }
        $xml->next('RemovedProducts');
      }
    }
4
  • So, does your script works with memory_limit(-1)? Commented Mar 2, 2020 at 14:04
  • 4
    You mention XMLReader but you're eventually using SimpleXML. The former should work fine with files any size—the latter won't. Commented Mar 2, 2020 at 14:06
  • $xml = new \XMLReader(); $xml->open(DIR . '/../../var/tmp/out.xml'); Commented Mar 2, 2020 at 14:09
  • Aksen P, yes it works with memory_limit = -1 Commented Mar 4, 2020 at 7:16

1 Answer 1

3

Rather than using SimpleXML to fetch all of the nodes within <UpdatedProducts>, you could nest the same code to make it read inside this node for the ` nodes. This will mean that the inner loop will get 1 node at a time...

while ($xml->name == 'UpdatedProducts') {
    while ($xml->read() && $xml->name !== 'ProductId');
    while ($xml->name == 'ProductId') {
        echo $xml->readOuterXml().PHP_EOL;
        $xml->next('ProductId');
    }
    $xml->next('UpdatedProducts');
}

For both of the types, I've tried to reduce it to one loop. It's not ideal but seems to work...

$xml = new \XMLReader();
$xml->open(__DIR__ . '/../../var/tmp/out.xml');
while ($xml->read() && $xml->name != 'UpdatedProducts');
$type = "update";
while ($xml->read() && $xml->name != 'ProductId');
while ($xml->name == 'ProductId') {
    $id = $xml->readInnerXml();
    if ( !empty($id) )  {
        $this->saveToDb($xml->readInnerXml(), $type);
    }
    while ($xml->read() && $xml->name != 'ProductId'
            && $xml->name != 'RemovedProducts');
    if ( $xml->name == 'RemovedProducts' )  {
        $type = "remove";
        while ($xml->read() && $xml->name != 'ProductId');
    }
}

There is an alternative, using a library I've written to wrap around XMLReader (at https://github.com/NigelRel3/XMLReaderReg). You will have to download it as there is no composer version yet. But copy the XMLReaderReg.php script to your project and

require_once "XMLReaderReg.php";

then you can use...

$reader = new XMLReaderReg();
$reader->open(__DIR__ ."/../../var/tmp/out.xml");

$reader->process([
    '.*/UpdatedProducts/ProductId' => function (SimpleXMLElement $data): void {
        $this->saveToDb((string)$data, "update");
    },
    '.*/RemovedProducts/ProductId' => function (SimpleXMLElement $data): void {
        $this->saveToDb((string)$data, "remove");
    },
]);

$reader->close();
Sign up to request clarification or add additional context in comments.

1 Comment

How can i iterate over both Updatedproducts and RemovedProducts in this ? Right now i have added the same code, but changed UpdatedProducts to RemovedProducts and loop that afterwards. And that ends up in a Allowed memory size of 536870912 bytes exhausted

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.