How to parse RSS non-XML feed in PHP?

Question

I've been parsing tons of RSS feeds using PHP's simplexml_load_file and it works like a charm. Now I'm trying to do the same for the RSS feed of the Financial Times. When I do...

$rss = simplexml_load_file("http://www.ft.com/rss/world");

... I get:

Warning: simplexml_load_file(): http://www.ft.com/rss/world:11: parser error : Opening and ending tag mismatch: link line 8 and head in rss.php on line 6

Warning: simplexml_load_file(): oat:left;margin-right:20px;margin-top:3px;width:35px;height:31px;}</style></head in rss.php on line 6

Warning: simplexml_load_file(): ^ in rss.php on line 6

Warning: simplexml_load_file(): http://www.ft.com/rss/world:37: parser error : Opening and ending tag mismatch: input line 37 and li in rss.php on line 6

Warning: simplexml_load_file(): ^ in rss.php on line 6

and many, many more warnings (around 100).

I've searched Stackoverflow for answers, but I can't find anything that seems to apply to this case. What am I missing here?

hakre · Accepted Answer · 2014-06-19 21:50:31Z

1

For some websites to work, you need to have a user-agent set with the HTTP request. As the default in PHP might be empty (which seems a sane setting privacy wise), you need to set it for the request:

ini_set('user_agent', "Godzilla/42.4 (Gabba Gandalf Client 7.3; C128; Z80) Lord of the RSS Weed Edition (KHTML, like Gold Dust Day Gecko) Chrome/97.0.43043.0 Safari/1337.42");

$rss = simplexml_load_file("http://www.ft.com/rss/world");

answered Jun 19, 2014 at 21:50

hakre

200k55 gold badges453 silver badges865 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Othi · Accepted Answer · 2014-05-22 11:30:20Z

0

Your code works for me here. Try omitting LIBXML_NOWARNING & LIBXML_NOERROR (which suppress any errors you might be getting) to see where it went wrong.

answered May 22, 2014 at 11:30

Othi

3641 silver badge6 bronze badges

3 Comments

TheBigDoubleA Over a year ago

Have you tried with the FT feed? I omitted the LIBXML extensions, yet it's still the same. vardump returns false. Please bear in mind that this code works fine for most other feeds...

Othi Over a year ago

It appears you're getting HTML from the URL. Try fetching it with file_get_contents and echo'ing it to see what your webserver is receiving. Maybe they're filtering some user agents from fetching their feed.

TheBigDoubleA Over a year ago

You're right: I get an html page, which is this one: ft.com/gfdlgjfdglkfjdgd. How can I overcome this?

Collectives™ on Stack Overflow

How to parse RSS non-XML feed in PHP?

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related