2

So currently I am working with a REST API provided by a third party tool that requires me to make thousands of requests per run of my script.

For the most part, everything works well the only issue is that it takes some time. Now that the logic is finished and complete, I am looking to improve the performance of the script by playing with the CURL requests.

Some notes:

  • Using a third party app (like POSTMAN) I get a faster response on average per request ~ 600ms(POSTMAN) vs 1300ms(PHP Curl). I was kind of able to achieve this rate which means I think I have the best optimization I can get

  • I am currently using curl_multi in other parts of my script, but the part I am currently targeting has one CURL request depend on the return value of another.

  • These are all GET requests, I have POST,PUT,DELETE, and PATCH, but those are used rather sparingly so the area I am targeting are the linear GET requests.

I have done some research and for the most part everyone recommends using curl_mutli as the default, but I can't really do that as the requests are chained. I was looking at the PHP documentation and I thought about going past my basic GET request and adding some more options.


Original Code

So below is my first simplistic take, creates a new curl object, sets the request type, transfer, header, credentials, and SSL verification (Needed this to bypass the third part tool moving to a cloud instance). For the specific query I was testing this ran at about ~1460ms

(From test code below = 73sec/50runs = 1.46seconds)

 function executeGET($getUrl)
{
    /* Curl Options */
    $ch = curl_init($this->URL . $getUrl);
    curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
    curl_setopt($ch, CURLOPT_USERPWD, /*INSERT CREDENTIALS*/);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

    /* Result handling and processing */
    $result = curl_exec($ch);

    return $result;
}

First Attempt at optimization

Next I started going through already posted stack overflow questions and the PHP documentation to look at different CURL options to see what would affect the request. The first thing I found was the IP resolve, I forced that to default to IPV4 and that sped it up by ~ 100ms. The big one however was the encoding which sped it up by about ~300ms.

(From test code below = 57sec/50runs = 1.14seconds)

    function executeGET($getUrl)
{
    /* Curl Options */
    $ch = curl_init($this->URL . $getUrl);
    curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
    curl_setopt($ch, CURLOPT_USERPWD, /*INSERT CREDENTIALS*/);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 ); //NEW
    curl_setopt($ch, CURLOPT_ENCODING, ''); //NEW

    /* Result handling and processing */
    $result = curl_exec($ch);

    return $result;
}

Re-using same curl object

The last part I decided to try was using a single curl object but just changing the URL before each request. My logic was that all the options were the same I was just querying a different endpoint. So I merged the newly discovered options for IPRESOLVE and the ENCODING along with the re-use and this is what I got in its most striped down version.

(From test code below = 32sec/50runs = 0.64seconds)

    private $ch = null;
function executeREUSEGET($getUrl)
{
    /* Curl Options */
    if ($this->ch == null) {
        $this->ch = curl_init();
        curl_setopt($this->ch, CURLOPT_CUSTOMREQUEST, "GET");
        curl_setopt($this->ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($this->ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
        curl_setopt($this->ch, CURLOPT_USERPWD, /*INSERT CREDENTIALS*/);
        curl_setopt($this->ch, CURLOPT_SSL_VERIFYPEER, false); //Needed to bypass SSL for multi calls
        curl_setopt($this->ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 ); //NEW
        curl_setopt($this->ch, CURLOPT_ENCODING, '');
    }
    curl_setopt($this->ch, CURLOPT_URL, $this->URL . $getUrl);

    /* Result handling and processing */
    $result = curl_exec($this->ch);
    return $result;
}

Testing Code

Reuse GET taken in seconds is: 32 
Normal GET(with extra options) taken in seconds is: 58
Normal GET taken in seconds is: 73

$common = new Common();
$startTime = time();
for ($x = 0; $x < 50; $x++){
    $r = $common->executeREUSEGET('MY_COMPLEX_QUERY');
}
echo 'Time taken in seconds is: '.(time()-$startTime);
$startTime = time();
for ($x = 0; $x < 50; $x++){
    $r = $common->executeGET('MY_COMPLEX_QUERY');
}
echo 'Reuse taken in seconds is: '.(time()-$startTime);

Conclusion

This was my thought process when going my code and trying to speed up my request to more closely match what I was receiving using POSTMAN. I was hoping to get some feedback on what I can improve on to speed up these specific GET requests or if there is even anything else I can do?

EDIT: I didn't really do much testing with the curl_multi in terms of optimization, but now that I have my default GET request down under a second, would it be better to convert those curl_multi into my executeReuseGet? The difference is when I use curl_multi I just need the data, none of it depends on previous input like my executeGets.

9
  • If you must make "thousands of requests per run" of your script, then this is a textbook case where you should be using multithreading/multiprocessing. The primary limitation is that curl requests involve a lot of latency -- i.e., your computer is waiting around for the remote machine to respond. The easiest way to accomplish this would be to look at curl_multi. Commented Aug 7, 2019 at 17:17
  • @S.Imp I understand that, and I do use curl_multi at times where I can. When I just need data to be extracted where none of the calls rely on one another. But in the cases where I am using my singular requests, it is because the requests are chained upon another. Commented Aug 7, 2019 at 17:19
  • Would this not be better to run async? I am assuming this is part of a web based application, can it not be done in Ajax (node.js)? Load the minimal then load it async, it may take delays to load messages etc or whatever it is you stream data for, but it would be better to load the infrastructure then await data with a nice overlay on the specific div fields Commented Aug 7, 2019 at 17:26
  • @Jaquarh Thank you for your feedback! So at the time I started this project my web-based knowledge was pretty much raw HTML. I picked up PHP as it seemed the simplest to start with REST calls as it had the CURL library built in. This isn't really meant for a front-end type use, it has a very simple UI to execute these scripts but what they are first and foremost, is just data generation and extraction. I have about 15 different projects which all utilize these calls, and about 13 of them write the contents to a log file. Technically the user doesn't even need/want to see it until it is done! Commented Aug 7, 2019 at 17:30
  • Optimization with REST is a hit and miss, HTTP requests can vary in timescale for each individual request. If user interaction is not needed then perhaps optimization to the level you're expecting isn't necessary? If its huge, perhaps consider load balancing Commented Aug 7, 2019 at 17:33

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.