1

I've been parsing tons of RSS feeds using PHP's simplexml_load_file and it works like a charm. Now I'm trying to do the same for the RSS feed of the Financial Times. When I do...

$rss = simplexml_load_file("http://www.ft.com/rss/world");

... I get:

Warning: simplexml_load_file(): http://www.ft.com/rss/world:11: parser error : Opening and ending tag mismatch: link line 8 and head in rss.php on line 6

Warning: simplexml_load_file(): oat:left;margin-right:20px;margin-top:3px;width:35px;height:31px;}</style></head in rss.php on line 6

Warning: simplexml_load_file(): ^ in rss.php on line 6

Warning: simplexml_load_file(): http://www.ft.com/rss/world:37: parser error : Opening and ending tag mismatch: input line 37 and li in rss.php on line 6

Warning: simplexml_load_file(): ^ in rss.php on line 6

and many, many more warnings (around 100).

I've searched Stackoverflow for answers, but I can't find anything that seems to apply to this case. What am I missing here?

2 Answers 2

1

For some websites to work, you need to have a user-agent set with the HTTP request. As the default in PHP might be empty (which seems a sane setting privacy wise), you need to set it for the request:

ini_set('user_agent', "Godzilla/42.4 (Gabba Gandalf Client 7.3; C128; Z80) Lord of the RSS Weed Edition (KHTML, like Gold Dust Day Gecko) Chrome/97.0.43043.0 Safari/1337.42");

$rss = simplexml_load_file("http://www.ft.com/rss/world");
Sign up to request clarification or add additional context in comments.

Comments

0

Your code works for me here. Try omitting LIBXML_NOWARNING & LIBXML_NOERROR (which suppress any errors you might be getting) to see where it went wrong.

3 Comments

Have you tried with the FT feed? I omitted the LIBXML extensions, yet it's still the same. vardump returns false. Please bear in mind that this code works fine for most other feeds...
It appears you're getting HTML from the URL. Try fetching it with file_get_contents and echo'ing it to see what your webserver is receiving. Maybe they're filtering some user agents from fetching their feed.
You're right: I get an html page, which is this one: ft.com/gfdlgjfdglkfjdgd. How can I overcome this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.