0

I am trying to get data using Curl function but unfortunately it returns empty for most websites. My code is as below

$responses = multi([
    'blocket' => ['url' => 'http://blocket.se','opts' => [ CURLOPT_RETURNTRANSFER => true]]
]);
print_r($responses);


function multi(array $requests, array $opts = []) { 
// create array for curl handles
$chs = [];
// merge general curl options args with defaults
$opts += [CURLOPT_CONNECTTIMEOUT => 3, CURLOPT_TIMEOUT => 3, CURLOPT_RETURNTRANSFER => 1];
// create array for responses
$responses = [];
// init curl multi handle
$mh = curl_multi_init();
// create running flag
$running = null;
// cycle through requests and set up
foreach ($requests as $key => $request) {

    // init individual curl handle
    $chs[$key] = curl_init();
    // set url
    curl_setopt($chs[$key], CURLOPT_URL, $request['url']);
    $scraper[$key] = $request['scraper'];
    // check for post data and handle if present
    if (isset($request['post_data'])) {
        curl_setopt($chs[$key], CURLOPT_POST, 1);
        curl_setopt($chs[$key], CURLOPT_POSTFIELDS, $request['post_array']);
    }
    // set opts 
    curl_setopt_array($chs[$key], (isset($request['opts']) ? $request['opts'] + $opts : $opts));
    curl_multi_add_handle($mh, $chs[$key]);
}
do {
    // execute curl requests
    curl_multi_exec($mh, $running);
    // block to avoid needless cycling until change in status
    curl_multi_select($mh);
// check flag to see if we're done
} while($running > 0);
// cycle through requests
foreach ($chs as $key => $ch) {
    // handle error
    if (curl_errno($ch)) {
        $responses[$key] = ['data' => null, 'info' => null, 'error' => curl_error($ch), 'scraper' => $scraper[$key]];
    } else {
        // save successful response
        $responses[$key] = ['data' => curl_multi_getcontent($ch), 'info' => curl_getinfo($ch), 'error' => null, 'scraper' => $scraper[$key]];
    }
    // close individual handle
    curl_multi_remove_handle($mh, $ch);
}
// close multi handle
curl_multi_close($mh);
// return respones
return $responses;
}

Result

Array ( [blocket] => Array ( [data] => [info] => Array ( [url] => http://blocket.se/ [content_type] => [http_code] => 302 [header_size] => 119 [request_size] => 49 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.328 [namelookup_time] => 0 [connect_time] => 0.172 [pretransfer_time] => 0.172 [size_upload] => 0 [size_download] => 0 [speed_download] => 0 [speed_upload] => 0 [download_content_length] => 0 [upload_content_length] => -1 [starttransfer_time] => 0.328 [redirect_time] => 0 [redirect_url] => https://www.blocket.se [primary_ip] => 185.49.132.3 [certinfo] => Array ( ) [primary_port] => 80 [local_ip] => 192.168.0.135 [local_port] => 58357 ) [error] =>) )

As you can see in Resutl [data] is empty.

UPDATE

After discussion with @Sahil discovered that the code above works fine for the websites which do not have SSL. But those that do, this code fails. So I tried using SSL_VERIFYPEER and SSL_VERIFYHOST along with CURLOPT_FOLLOWLOCATION but none of these has helped so far

2 Answers 2

1

As @Sahil pointed out the code is fine. Basically the issue was that CURL wasn't working for HTTPS websites. This was because CA root certificate wasn't defined in php.ini.

If you are having similar issue please visit http://curl.haxx.se/docs/caextract.html and download the certificate. Save it at your desired location and define absolute path to this file in your php.ini for example

curl.cainfo = c:\wamp\cacert.pem

As mentioned on various websites using CURLOPT_SSL_VERIFYPEER = false makes your website vulnerable to attacks

Sign up to request clarification or add additional context in comments.

Comments

0

Everything is working fine with your code the only problem is your URL ,Your current url is redirecting with 302 Response. Try this.

Change your url:

http://www.blocket.se/

This:

https://www.blocket.se/

enter image description here PHP code:

<?php

ini_set('display_errors', 1);
$responses = multi([
    'blocket' => ['url' => 'https://www.blocket.se/', 'opts' => [ CURLOPT_RETURNTRANSFER => true]]
        ]);
print_r($responses);

function multi(array $requests, array $opts = [])
{

    $chs = [];

    $opts += [CURLOPT_CONNECTTIMEOUT => 3, CURLOPT_TIMEOUT => 3, CURLOPT_RETURNTRANSFER => 1];

    $responses = [];

    $mh = curl_multi_init();

    $running = null;

    foreach ($requests as $key => $request)
    {
        $chs[$key] = curl_init();
        curl_setopt($chs[$key], CURLOPT_URL, $request['url']);
        $scraper[$key] = $request['scraper'];
        if (isset($request['post_data']))
        {
            curl_setopt($chs[$key], CURLOPT_POST, 1);
            curl_setopt($chs[$key], CURLOPT_POSTFIELDS, $request['post_array']);
        }
        curl_setopt_array($chs[$key], (isset($request['opts']) ? $request['opts'] + $opts : $opts));
        curl_multi_add_handle($mh, $chs[$key]);
    }
    do
    {
        curl_multi_exec($mh, $running);
        curl_multi_select($mh);
    } while ($running > 0);
    foreach ($chs as $key => $ch)
    {
        if (curl_errno($ch))
        {
            $responses[$key] = ['data' => null, 'info' => null, 'error' => curl_error($ch), 'scraper' => $scraper[$key]];
        } else
        {
            $responses[$key] = ['data' => curl_multi_getcontent($ch), 'info' => curl_getinfo($ch), 'error' => null, 'scraper' => $scraper[$key]];
        }
        curl_multi_remove_handle($mh, $ch);
    }
    curl_multi_close($mh);
    return $responses;
}

4 Comments

as per your recommendation I tried blocket.se but this didn't return any data either though http_code changed to 0. See the result below Array ( [blocket] => Array ( [data] => [info] => Array ( [url] => blocket.se [content_type] => [http_code] => 0 [header_size] => 0
Yes $responses = multi([ 'blocket' => ['url' => 'blocket.se','opts' => [ CURLOPT_RETURNTRANSFER => true]] ]);
can you change your environment and try again let me show you a demo for this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.