2

Im migrating big Wordpress page to custom CMS. I need to extract information from big (20MB+) XML file, exported from Wordpress.

  1. I don't have any experience in XML under PHP and i don't know how to start reading file.

  2. Wordpress file contains structures like this:

    <excerpt:encoded><![CDATA[Encoded text here]]></excerpt:encoded>
    

and i don't know how to handle this in PHP.

5 Answers 5

4

You are probably going to do fine with simplexml:

$xml = simplexml_load_file('big_xml_file.xml');
foreach ($xml->element as $el) {
    echo $el->name;
}

See php.net for more info

Sign up to request clarification or add additional context in comments.

Comments

1

Unfortunately, your XML example didn't come through.

PHP5 ships with two extensions for working with XML - DOM and "SimpleXML".
Generally speaking, I recommend looking into SimpleXML first since it's the more accessible library of the two.

For starters, use "simplexml_load_file()" to read an XML file into an object for further processing.
You should also check out the "SimpleXML basic examples page on php.net".

Comments

1

I don't have any experience in XML under PHP

Take a look at simplexml_load_file() or DomDocument.

<excerpt:encoded><![CDATA[Encoded text here]]></excerpt:encoded>

This should not be a problem for the XML parser. However, you will have a problem with the content exported by WordPress. For example, it can contain WordPress shortcodes, which will come across in their raw format instead of expanded.

Better Approach

Determine if what you are migrating to supports an export from WordPress feature. Many other systems do - Drupal, Joomla, Octopress, etc.

1 Comment

Target is custom CMS wrote by someone else. If there are shortcodes - i w will probably handle them manually in PHP (somehow).
1

Although Adam is Absolutely right, his answer needed a bit more details. Here's a simple script that should get you going.

$xmlfile  = simplexml_load_file('yourxmlfile.xml');
foreach ($xmlfile->channel->item as $item) {
    var_dump($item->xpath('title'));
    var_dump($item->xpath('wp:post_type'));
}

Comments

-1

simplexml_load_file() is the way to go creating an object, but you will also need to use xpath as WordPress uses name spaces. If I remember correctly SimpleXML does not handle name space well or at all.

$xml = simplexml_load_file( $file ); $xml->xpath('/rss/channel/wp:category');

I would recommend looking at what WordPress uses for importing the files.

https://github.com/WordPress/WordPress/blob/master/wp-admin/includes/class-wp-importer.php

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.