cURL returns binary data instead of html

Question

function curl($url) {

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/25.0.1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE, 'long cookie here');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$output = curl_exec($ch);
curl_close($ch);
return $output;

}

The original url I'm feeding it is http://example.com/i-123.html but if I open in browser, I get redirected to https://example.com/item-description-123.html (so I added CURLOPT_FOLLOWLOCATION).

However, the output of this function is binary data.

1f8b 0800 0000 0000 0003 ed7d e976 db38
f2ef e7f8 2930 9ac9 d86e 9b92 b868 f3a2
3e5e 9374 67fb c7ee 74f7 e4e6 f880 2428
31a6 4835 172f 3dd3 8f74 3fde 17b8 f7c5
6e15 008a 8ba8 2db1 3ce9 25a7 dba4 4810
......

How do I fix this? I tried adding

curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);

(copied from somewhere). Didn't work.

file_get_contents() gives me the same output.

how do you print the output? How did you get that columns data on your screen? — Oleg Dubas
– Oleg Dubas, Commented Feb 2, 2015 at 18:09
if a php echoes the binary data, it is just displayed as broken characters. I don't get it how you get those columns on your screen — Oleg Dubas
– Oleg Dubas, Commented Feb 2, 2015 at 18:13
Well, the command written above doesn't echo the output in the terminal, but saves it into a file. when you open the file with a text editor, you see what I posted. — mariobgr
– mariobgr, Commented Feb 2, 2015 at 18:14
try switching your text editor to UTF-8 text mode instead of binary — Oleg Dubas
– Oleg Dubas, Commented Feb 2, 2015 at 18:21

mariobgr · Accepted Answer · 2015-02-03 09:27:57Z

37

Well, the solution was pathetic...

Using wget -S http://example.com I found out that the content is compressed (gzipped). Using gunzip I successfully extracted the html.

Also added to my original PHP script

curl_setopt($ch,CURLOPT_ENCODING , "");

And it worked like a charm.

answered Feb 3, 2015 at 9:27

mariobgr

2,2012 gold badges18 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Oleg Dubas Over a year ago

Very interesting. A thing to remember. Glad you found the answer!

speg Over a year ago

Wow. I thought the site was using some sort of trickery to return garbage and prevent me from scraping it. Thanks!

jonincanada Over a year ago

Or add the --compressed option for auto ungzip.

Andrew Newby Over a year ago

As a side note - I was running curl from the CLI. Adding --compressed as an option, mean't it then correctly downloaded as HTML. This answer pushed me in the right direction :)

nck Over a year ago

This also worked for me, In windows I piped to download.gzip and then extracted it with 7zip.

Collectives™ on Stack Overflow

cURL returns binary data instead of html

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related