0

I have large XML file (400 MB) and I need update it daily. For every main element I use SELECT + INSERT/UPDATE query into database. When I run script, it process 26 main elements per minute but it slows - after 500 main elements, it is much slower (10 elements per minute).

    $xml_reader = new XMLReader;
$xml_reader->open("feed.xml");


// move the pointer to the first product
while ($xml_reader->read() && $xml_reader->name != 'SHOPITEM');

// loop through the products
while ($xml_reader->name == 'SHOPITEM')
            {
            // load the current xml element into simplexml and we’re off and running!
            $feed = simplexml_load_string($xml_reader->readOuterXML());

            // now you can use your simpleXML object ($xml).
            //e.g. $feed->PRODUCTNO

            //SELECT, UPDATE/INSERT HERE
    }

    // move the pointer to the next product
    $xml_reader->next('SHOPITEM');
}

// don’t forget to close the file
$xml_reader->close();

This is the XML:

<?xml version="1.0" encoding="utf-8"?>
<SHOP>
    <SHOPITEM> 
        <ITEM_ID>2600000394161</ITEM_ID> 
        (+ 15 more elements like this) 
        <PARAM>
            <PARAM_NAME><![CDATA[some data here]]></PARAM_NAME> 
            <VAL><![CDATA[some data here]]></VAL> 
        </PARAM> 
        (+ 10 more elements like this) 
    </SHOPITEM> 
    (lot of shopitems here) 
</SHOP>

I can't use SimpleXML due to my RAM. Is there any faster PHP XML parser or what way does it big sites (e.g. price compare sites)? Better HW? My CPU is on 10% and RAM on 80% when XML processing.

8
  • If your XML has a fixed structure, it can only be faster to parse it manually. XML Parsers are made to be multipurpose, they'll use more CPU and RAM that you actually need. Can you post an example or your XML Structure ? Commented Jul 29, 2015 at 13:57
  • sounds like you should use an other data-store then a single XML file. at least split into many files, or more likely, just use a DB. Commented Jul 29, 2015 at 13:58
  • 1
    if you rewrite the XML to a SQLite db, it will go MUCH faster! food for thought... Commented Jul 29, 2015 at 17:08
  • 1
    i had a problem with 2GB+ XML file that took wayy too much time to edit.. which i had to do daily. when it was re-written as a SQLite file instead, it went from several minutes, to mere seconds! (and the save file went from 2GB+, to 10MB or so, amazing) Commented Jul 29, 2015 at 17:08
  • 1
    also, i doubt your CPU is on 10% for XML processing.. its probably "100% (minus harddrive IO) of 1 core", and you have 8+ (logical) CPU cores ;) Commented Jul 29, 2015 at 17:17

3 Answers 3

1

Consider using an XML database (e.g. eXist or BaseX). At this sort of size, it will be much more efficient.

Sign up to request clarification or add additional context in comments.

Comments

1

You can speed up the XML parsing a little by expanding the node from XMLReader to a DOM element node, the element node can be converted into an SimpleXMLElement.

$xml_reader = new XMLReader;
$xml_reader->open("feed.xml");

$dom = new DOMDocument();
$xpath = new DOMXPath($dom);

// move the pointer to the first product
while ($xml_reader->read() && $xml_reader->name != 'SHOPITEM');

// loop through the products
while ($xml_reader->name == 'SHOPITEM') {
  // expand to DOM element
  $element = $xml_reader->expand($dom);

  // use Xpath with the $element as context node
  $itemId = $xpath->evaluate('string(ITEM_ID)', $element);

  // or import it into a SimpleXMLElement
  $item = simplexml_import_dom($element);

  //SELECT, UPDATE/INSERT HERE

  // move the pointer to the next product
  $xml_reader->next('SHOPITEM');
}
// don’t forget to close the file
$xml_reader->close();

To speed up the process you should collect the data and do mass inserts. Most DBMS support it this on way or another. Less SQL queries means less work on the database, but you need memory to collect the data. You will have to find a balance.

Another possibility is to generate a local file containing all the SQL statements and executing it using a console client. This can be really fast, but it is a potential security risk.

Comments

0

I cannot say whether or not the DOMDocument methods would be quicker than SimpleXML or not, but to use DOMDocument you would do something like below (untested):

$xmlfile='/path/to/feed.xml';

if( file_exists( $xmlfile ) ){
    libxml_use_internal_errors( TRUE );
    $dom = new DOMDocument('1.0','utf-8');
    $dom->validateOnParse=false;
    $dom->standalone=true;
    $dom->preserveWhiteSpace=true;
    $dom->strictErrorChecking=false;
    $dom->substituteEntities=false;
    $dom->recover=true;
    $dom->formatOutput=false;
    $dom->loadXML( $xmlfile );
    $parse_errs=serialize( libxml_get_last_error() );
    libxml_clear_errors();

    $xpath=new DOMXPath( $dom );

    $items = $dom->getELementsByTagName('SHOPITEM');

    foreach( $items as $node ){

        echo $node->nodeValue;

        /* Each node is likely to have children */
        $children=$xpath->query('PRODUCTNO',$node);
        foreach( $children as $child ){
            echo $child->nodeValue;
        }

        /*
            mysql cmds
            ----------
            select , update, insert
        */
    }

    $dom=null;
}

1 Comment

Thank you. I will try that and reply about speed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.