0

I have this PHP script that sends a lot of requests to websites using cURL, and I've tried to clean it up as much as possible, but I've hit a wall. What else can I do to this script to make it faster?

<?php
    ini_set('max_execution_time', 3600);
    function remoteStatusCode($url){
        $ch = curl_init();
        curl_setopt ($ch, CURLOPT_URL,$url );
        curl_setopt($ch, CURLOPT_NOBODY, true);
        curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
        curl_exec($ch);
        $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);
        if ($httpcode == 404){
            echo $url;
        }
    }
    $lines = file('wordsEn.txt', FILE_IGNORE_NEW_LINES);
    foreach ($lines as $n){
        remoteStatusCode('<br>https://twitter.com/'.$n);
    }
?>

Any suggestions are good suggestions in my book, I need all the help I can get.

6
  • 1
    php.net/manual/en/function.curl-multi-init.php Commented Oct 2, 2013 at 0:02
  • meta.codereview.stackexchange.com Commented Oct 2, 2013 at 0:23
  • @Fred -ii-: I'm sure it's fits SO nice Commented Oct 2, 2013 at 0:24
  • @zerkms That's what it's there for. After all, they built it; others can use it. Commented Oct 2, 2013 at 0:50
  • @Fred -ii-: this question isn't about reviewing it, but about a particular programming advice. Commented Oct 2, 2013 at 1:03

1 Answer 1

3

This is benchmark for cUrl and Multi cUrl to get header information :

<?php
include_once "CURL.php";
$curl = new CURL();

$start_time = microtime(true);

for ($i=0; $i < 10; $i++) { 
    $ch = curl_init();
    curl_setopt ($ch, CURLOPT_URL,"https://twitter.com" );
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
    curl_exec($ch);
    $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);
}
    $end_time = microtime(true);
    echo "cUrl Timing : ";
    echo $end_time - $start_time;
    echo "\n";

$start_time = microtime(true);
$get = array(CURLOPT_HEADER=>true,CURLOPT_SSL_VERIFYHOST=>false,CURLOPT_RETURNTRANSFER=>true);
for ($i=0; $i < 10; $i++) { 
    $curl->addSession("https://twitter.com",$get);
}
$datas = $curl->exec();
$end_time = microtime(true);
echo "Multi cUrl Timing : ";
echo $end_time - $start_time;
echo "\n";

print_r($datas[1]);// http codes of each request


?>

CURL.php (referance is here, but i made some changes to get the header information in the multi cUrl class which is CURL)

<?php

/**
* OO cURL Class
* Object oriented wrapper for the cURL library.
* @author David Hopkins (semlabs.co.uk)
* @version 0.3
*/
class CURL
{

    public $sessions                =   array();
    public $retry                   =   0;

    /**
    * Adds a cURL session to stack
    * @param $url string, session's URL
    * @param $opts array, optional array of cURL options and values
    */
    public function addSession( $url, $opts = false )
    {
        $this->sessions[] = curl_init( $url );
        if( $opts != false )
        {
            $key = count( $this->sessions ) - 1;
            $this->setOpts( $opts, $key );
        }
    }

    /**
    * Sets an option to a cURL session
    * @param $option constant, cURL option
    * @param $value mixed, value of option
    * @param $key int, session key to set option for
    */
    public function setOpt( $option, $value, $key = 0 )
    {
        curl_setopt( $this->sessions[$key], $option, $value );
    }

    /**
    * Sets an array of options to a cURL session
    * @param $options array, array of cURL options and values
    * @param $key int, session key to set option for
    */
    public function setOpts( $options, $key = 0 )
    {
        curl_setopt_array( $this->sessions[$key], $options );
    }

    /**
    * Executes as cURL session
    * @param $key int, optional argument if you only want to execute one session
    */
    public function exec( $key = false )
    {
        $no = count( $this->sessions );

        if( $no == 1 )
            $res = $this->execSingle();
        elseif( $no > 1 ) {
            if( $key === false )
                $res = $this->execMulti();  
            else
                $res = $this->execSingle( $key );
        }

        if( $res )
            return $res;
    }

    /**
    * Executes a single cURL session
    * @param $key int, id of session to execute
    * @return array of content if CURLOPT_RETURNTRANSFER is set
    */
    public function execSingle( $key = 0 )
    {
        if( $this->retry > 0 )
        {
            $retry = $this->retry;
            $code = 0;
            while( $retry >= 0 && ( $code[0] == 0 || $code[0] >= 400 ) )
            {
                $res = curl_exec( $this->sessions[$key] );
                $code = $this->info( $key, CURLINFO_HTTP_CODE );

                $retry--;
            }
        }
        else
            $res = curl_exec( $this->sessions[$key] );

        return $res;
    }

    /**
    * Executes a stack of sessions
    * @return array of content if CURLOPT_RETURNTRANSFER is set
    */
    public function execMulti()
    {
        $mh = curl_multi_init();

        #Add all sessions to multi handle
        foreach ( $this->sessions as $i => $url )
            curl_multi_add_handle( $mh, $this->sessions[$i] );

        do
            $mrc = curl_multi_exec( $mh, $active );
        while ( $mrc == CURLM_CALL_MULTI_PERFORM );

        while ( $active && $mrc == CURLM_OK )
        {
            if ( curl_multi_select( $mh ) != -1 )
            {
                do
                    $mrc = curl_multi_exec( $mh, $active );
                while ( $mrc == CURLM_CALL_MULTI_PERFORM );
            }
        }

        if ( $mrc != CURLM_OK )
            echo "Curl multi read error $mrc\n";

        #Get content foreach session, retry if applied
        foreach ( $this->sessions as $i => $url )
        {
            $code = $this->info( $i, CURLINFO_HTTP_CODE );
            $codes[] = $code;
            if( $code[0] > 0 && $code[0] < 400 )
                $res[] = curl_multi_getcontent( $this->sessions[$i] );
            else
            {
                if( $this->retry > 0 )
                {
                    $retry = $this->retry;
                    $this->retry -= 1;
                    $eRes = $this->execSingle( $i );

                    if( $eRes )
                        $res[] = $eRes;
                    else
                        $res[] = false;

                    $this->retry = $retry;
                    echo '1';
                }
                else
                    $res[] = false;
            }
            curl_multi_remove_handle( $mh, $this->sessions[$i] );
        }

        curl_multi_close( $mh );
        $all[] = $res;
        $all[] = $codes;
        return $all;
    }

    /**
    * Closes cURL sessions
    * @param $key int, optional session to close
    */
    public function close( $key = false )
    {
        if( $key === false )
        {
            foreach( $this->sessions as $session )
                curl_close( $session );
        }
        else
            curl_close( $this->sessions[$key] );
    }

    /**
    * Remove all cURL sessions
    */
    public function clear()
    {
        foreach( $this->sessions as $session )
            curl_close( $session );
        unset( $this->sessions );
    }

    /**
    * Returns an array of session information
    * @param $key int, optional session key to return info on
    * @param $opt constant, optional option to return
    */
    public function info( $key = false, $opt = false )
    {
        if( $key === false )
        {
            foreach( $this->sessions as $key => $session )
            {
                if( $opt )
                    $info[] = curl_getinfo( $this->sessions[$key], $opt );
                else
                    $info[] = curl_getinfo( $this->sessions[$key] );
            }
        }
        else
        {
            if( $opt )
                $info[] = curl_getinfo( $this->sessions[$key], $opt );
            else
                $info[] = curl_getinfo( $this->sessions[$key] );
        }

        return $info;
    }

    /**
    * Returns an array of errors
    * @param $key int, optional session key to retun error on
    * @return array of error messages
    */
    public function error( $key = false )
    {
        if( $key === false )
        {
            foreach( $this->sessions as $session )
                $errors[] = curl_error( $session );
        }
        else
            $errors[] = curl_error( $this->sessions[$key] );

        return $errors;
    }

    /**
    * Returns an array of session error numbers
    * @param $key int, optional session key to retun error on
    * @return array of error codes
    */
    public function errorNo( $key = false )
    {
        if( $key === false )
        {
            foreach( $this->sessions as $session )
                $errors[] = curl_errno( $session );
        }
        else
            $errors[] = curl_errno( $this->sessions[$key] );

        return $errors;
    }

}

?>

The result is :

cUrl Timing : 3.5252928733826

Multi cUrl Timing : 0.63891220092773

Array
(
    [0] => Array
        (
            [0] => 200
        )

    [1] => Array
        (
            [0] => 200
        )

    [2] => Array
        (
            [0] => 200
        )

    [3] => Array
        (
            [0] => 200
        )

    [4] => Array
        (
            [0] => 200
        )

    [5] => Array
        (
            [0] => 200
        )

    [6] => Array
        (
            [0] => 200
        )

    [7] => Array
        (
            [0] => 200
        )

    [8] => Array
        (
            [0] => 200
        )

    [9] => Array
        (
            [0] => 200
        )

)

If i can help you, i'll feel very happy .

Sign up to request clarification or add additional context in comments.

2 Comments

Now it's much better, +1. Are you keen to drop the other answer?
i delete my other answer to not mislead the users

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.