1

I'm trying to use regular expressions to extract the CDATA from the following XML feed: http://www.patrickarundell.net/THREE-IE-FEED.asp

My code is as follows:

$xml = file_get_contents('http://www.patrickarundell.net/THREE-IE-FEED.asp');

$arr = array();
preg_match('/(CDATA)(.*)/', $xml, $arr);
echo '<pre>';
    print_r($arr);
echo '</pre>';

The output is:

Array
(
    [0] => CDATA[
    [1] => CDATA
    [2] => [
)

I know I don't have the regular expression quite right, but when I try the following statement:

preg_match('/(<![CDATA[)(.*)/', $xml, $arr);

I get an error:

Warning: preg_match() [function.preg-match]: Compilation failed: missing terminating ] for character class at offset 15

I thought this might give me the details after the square bracket '[', which is what I'm looking for.

Any help appreciated, I've been trying this for a few hours and having no luck.

2
  • Did you consider a XML parser? Commented Apr 27, 2011 at 20:47
  • Yes, I'm using SimpleXMLElement to parse the rest of the file and that works fine. Its doesn't give me any problems. But I can't get the details in the CDATA part using SimpleXMLElement. If you see the XML file the actual horoscope detail is under the <horoscope> node. When I reference this node, it lumps all the data in together. Commented Apr 27, 2011 at 23:01

1 Answer 1

3

The reason for the error message is that it is missing a closing ] for a character class. But you didn't want to define a character class with your [ you want to match it, so you nedd to escape it \[.

<!\[(CDATA)\[\s*(.*?)\s*\]\]>

I tested it here on regexr

The .*? is a non greedy match, it matches as less as possible, until it finds the closing ]]>.

Sign up to request clarification or add additional context in comments.

2 Comments

stema, thanks for this. just one question, I managed to get the first CDATA value into the array. But as you can see from the XML there are a number of other CDATA segments. how do I manage these ?
@Stephen, I don't know php very well, but there is a function preg_match_all, try using this instead of preg_match. According to this documentation it should do what you want.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.