Get http-statuscode without body using cURL?

Question

I want to parse a lot of URLs to only get their status codes.

So what I did is:

$handle = curl_init($url -> loc);
             curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
             curl_setopt($handle, CURLOPT_HEADER  , true);  // we want headers
             curl_setopt($handle, CURLOPT_NOBODY  , true);
             curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
             $response = curl_exec($handle);
             $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
             curl_close($handle);

But as soon as the "nobody"-option is set to true, the returned status codes are incorrect (google.com returns 302, other sites return 303).

Setting this option to false is not possible because of the performance loss.

Any ideas?

do a custom request and issue only a HEAD. doing a full-blown get will also transfer the body. head gives you ONLY the headers. — Marc B
– Marc B, Commented Dec 1, 2014 at 19:59

Joyce Babu · Accepted Answer · 2014-12-01 20:22:26Z

2

The default HTTP request method for curl is GET. If you want only the response headers, you can use the HTTP method HEAD.

curl_setopt($handle, CURLOPT_CUSTOMREQUEST, 'HEAD');

According to @Dai's answer, the NOBODY is already using the HEAD method. So the above method will not work.

Another option would be to use fsockopen to open a connection, write the headers using fwrite. Read the response using fgets until the first occurrence of \r\n\r\n to get the complete header. Since you need only the status code, you just need to read the first 13 characters.

<?php
$fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
if ($fp) {
    $out = "GET / HTTP/1.1\r\n";
    $out .= "Host: www.google.com\r\n";
    $out .= "Accept-Encoding: gzip, deflate, sdch\r\n";
    $out .= "Accept-Language: en-GB,en-US;q=0.8,en;q=0.6\r\n";
    $out .= "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36\r\n";
    $out .= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n";
    $out .= "Connection: Close\r\n\r\n";
    fwrite($fp, $out);
    $tmp = explode(' ', fgets($fp, 13));
    echo $tmp[1];
    fclose($fp);
}

edited Dec 1, 2014 at 20:22

answered Dec 1, 2014 at 19:54

Joyce Babu

20.9k13 gold badges66 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Raphael Jeger Over a year ago

So I'd just have to add this option or remove one of the already set options?

Joyce Babu Over a year ago

Yes, add it before curl_exec. The HEAD method does not contain a response body, so you can remove CURLOPT_NOBODY.

Joyce Babu Over a year ago

As per @Dai's answer, it is not going to work. Sorry.

Raphael Jeger Over a year ago

unfortunately, that doesn't execute as fast as with CURLOPT_NOBODY (in fact, it seems to be just as slow as without CURLOPT_NOBODY and without a custom request) and the returned status code is still wrong...

Joyce Babu Over a year ago

Try the above code. It gives 302 for www.google.com, and is very fast too.

|

Dai · Accepted Answer · 2014-12-01 19:56:10Z

1

cURL's nobody option has it use the HEAD HTTP verb, I'd wager the majority of non-static web applications I the wild don't handle this verb correctly, hence the problems you're seeing with different results. I suggest making a normal GET request and discarding the response.

answered Dec 1, 2014 at 19:56

Dai

158k31 gold badges314 silver badges436 bronze badges

5 Comments

Raphael Jeger Over a year ago

the problem is performance - I need to do thousands of such requests in a row. Is there anything else I could do to speed it up?

user557846 Over a year ago

cheap multi-threading, divide the number of urls to check by the number of cores you have, run one script per core doing a subset of the url's

Raphael Jeger Over a year ago

you are right... it doesn't help that php flush() doesn't work because the server is running nginx, because then I could at least write out some status...

Raphael Jeger Over a year ago

strangely I just found out, all status codes >= 400 are correct, and because that's all I need, it's ok for me...

Raphael Jeger Over a year ago

Ok, this code is really insanely fast, but my URLs don't seem to work - like if I use www.raffiniert.biz/kunden/coop_ch which should return a 403, it just returns nothing, also if the URL has a trailing slash?

user557846 · Accepted Answer · 2014-12-01 19:54:36Z

0

i suggest get_headers() instead:

<?php
$url = 'http://www.example.com';

print_r(get_headers($url));

print_r(get_headers($url, 1));
?>

answered Dec 1, 2014 at 19:54

user557846

3 Comments

Raphael Jeger Over a year ago

that's not as fast as cURL according to my tests?

user557846 Over a year ago

ok, glad you benchmarked it. will leave for future reference.

Raphael Jeger Over a year ago

sorry, I wanted to say: It's faster than cURL with body, but slower than cURL without body (according to my tests). YMMV

Collectives™ on Stack Overflow

Get http-statuscode without body using cURL?

3 Answers 3

7 Comments

5 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related