4

I'm working on a XML reader and am running into a odd issue with a few feeds. Using CURL or even file_get_contents the feeds load as binary data more often than real data. Whenever I load the feed in a browser it looks fine.

The specific feed is http://www.winnipegsun.com/home/rss.xml

The code I am using is

$string = file_get_contents("http://www.winnipegsun.com/home/rss.xml");
var_dump( $string );

2 Answers 2

3

The response is gzipped:

If you look at the HTTP headers: Content-Encoding: gzip

Unzip it with PHP:

gzinflate(substr($string, 10));

http://php.net/manual/en/function.gzinflate.php

Hope that helps... cheers

Sign up to request clarification or add additional context in comments.

4 Comments

That did help. So should I check the headers of the feeds to see if they are zipped before I pass them to simple xml? Is that possible? Or would it be simpler to attempt to unzip the string and if it fails assume it is not zipped?
Thanks! I assumed it was something on there end :) To answer my question on how to determin if it is g-zipped I found a funciton get_headers, and combined with array_search I wrote the following $string = file_get_contents($feed[1]); if( array_search( "Content-Encoding: gzip", get_headers($feed[1])) ) { $string = gzinflate(substr($string, 10)); }
You can definitely do that. I'll try to work it into my HTTP client class that I wrote around PHP Curl: github.com/homer6/altumo/blob/master/source/php/Http/… and I'll post some sample code. I've also heard that people are very happy with zend's http client: framework.zend.com/manual/en/zend.http.client.adapters.html
Okay, so I wrote you a new class. The sample code in this markdown document applied directly to you. github.com/homer6/altumo/blob/master/source/php/Http/… Go Canada! :-)
0

You should be able to send an empty Accept-Encoding header to the server and then it should not send the content gzipped or return a Not Acceptable response:

$string = file_get_contents(
    "http://www.winnipegsun.com/home/rss.xml",
    FALSE,
    stream_context_create(
        array(
            'http' => array(
                'method'  => "GET",
                'headers' => 'Accept-Encoding:\r\n'
            )
        )
    )
);
var_dump($string);

I am not sure the webserver is configured correctly though, because it wouldnt respond to that with the uncompressed feed, even when adding Cache Control headers telling to it not send a cached response. Oddly enough, just doing

$string = file_get_contents("http://www.winnipegsun.com/home/rss.xml?".time());

worked out of the box. And you can also send a POST request.

2 Comments

Could the server be returning a gzipped cached version of the page? That may explain why it works sometimes (the cache expires) then for the next several requests it fails (cache is being returned).... Unfortunitly its not my server, just data I have to deal with. But that may explain why adding a get param of time would fix it.
@Jeff I'm not sure what's causing it. I suspected it to be a cached version since the response header indicate it came via a Cache server, but I tried sending a Cache-Control: No-Cache header and that would do nothing. It also doesnt explain why I can leave out Accept-Encoding completely and just add the time() param. It's odd.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.