0

I've been looking at asynchronous database requests in PHP using mysqlnd. The code is working correctly but comparing performance pulling data from one reasonable sized table versus the same data split across multiple tables using asynchronous requests I'm not getting anything like the performance I would expect although it does seem fairly changeable according to hardware setup.

As I understand it I should be achieving, rather than:

x = a + b + c + d

Instead:

x = max(a, b, c, d)

Where x is the total time taken and a to d are the times for individual requests. What I am actually seeing is a rather minor increase in performance on some setups and on others worse performance as if requests weren't asynchronous at all. Any thoughts or experiences from others who may have worked with this and come across the same are welcome.

EDIT: Measuring the timings here, we are talking about queries spread over 10 tables, individually the queries take no more than around 8 seconds to complete, combining the time each individual request takes to complete (not asynchronously) it totals around 18 seconds.

Performing the same requests asynchronously total query time is also around 18 seconds. So clearly the requests are not being executed in parallel against the database.

EDIT: Code used is exactly as shown in the documentation here

<?php
$link1 = mysqli_connect();
$link1->query("SELECT 'test'", MYSQLI_ASYNC);
$all_links = array($link1);
$processed = 0;
do {
    $links = $errors = $reject = array();
    foreach ($all_links as $link) {
        $links[] = $errors[] = $reject[] = $link;
    }
    if (!mysqli_poll($links, $errors, $reject, 1)) {
        continue;
    }
    foreach ($links as $link) {
        if ($result = $link->reap_async_query()) {
            print_r($result->fetch_row());
            if (is_object($result))
                mysqli_free_result($result);
        } else die(sprintf("MySQLi Error: %s", mysqli_error($link)));
        $processed++;
    }
} while ($processed < count($all_links));
?>
10
  • 2
    How did you measure performance, what exactly did you expect and what code did you use to perform asynchronous database request in the first place? Async db communication won't yield any performance in majority of use cases of PHP (this has a lot to do with how the entire stack operates). Commented Aug 21, 2015 at 12:41
  • Implemented as per the PHP documentation, I expected close to what I describe above, max(a, b, c, d). 'Async db communication won't yield any performance in majority of use cases' - can you explain this? Commented Aug 21, 2015 at 12:55
  • 1
    The idea behind asynchronous anything is that an event interface provided by the OS is used (epoll, kqueue, IOCP) so that CPU can be used for something while data is not there yet. While the data is being delivered between MySQL and PHP, what exactly is the rest of your code doing? Also, async data or sync data "delivery" still means that you will have the same amount of data delivered through the same unreliable network. Nothing can be faster there really. I can't see your code or SAPI that you use, but my comment is valid for majority of PHP use cases out there. Commented Aug 21, 2015 at 13:01
  • Not sure that applies here? The use case here is splitting a single request that would normally be against one large database table instead across multiple tables containing part of the data each using multiple asynchronous requests. Are you suggesting the bottleneck is not running the query against the data, rather getting the data to PHP? Not sure that sounds feasible when we are talking about the same amount of data as with the single request against the larger table. Commented Aug 21, 2015 at 13:15
  • That use case scenario helps you gain no performance. What is the PHP engine doing while it waits for the data to arrive? Asynchronous approach, when done properly, lets each of the PHP processes do something else until the data arrives. What exactly are the processes doing until the data arrives? Also, it's not really true that you gained any performance with your setup. I suggest posting some code so we can take a better look and potentially help you gain performance you want. However, async MySQL reads won't speed anything up. They just let you use CPU a bit better. Commented Aug 21, 2015 at 13:31

1 Answer 1

2

I'll expand my comments and I'll try to explain why you won't gain any performance using the setup you have currently.

Asynchronous, in your case, means that the process of retrieving data is asynchronous compared to the rest of your code. The two moving parts (getting data) and working with the data are separate and are executed one after another, but only when the data arrives.

This implies that you want to utilize the CPU to its fullest, so you won't invoke PHP code until the data is ready.

In order for that to work, you must seize the control of PHP process and make it use one of operating system's event interfaces (epoll on Linux, or IOCP on Windows). Since PHP is either embedded into a web server (mod_php) or runs as its own standalone FCGI server (php-fpm), that implies the best utilization of asynchronous data fetching would be when you run a CLI php script since it's quite difficult to utilize event interfaces otherwise.

However, let's focus on your problem and why your code isn't faster.

You assumed that you are CPU bound and your solution was to retrieve data in chunks and process them that way - that's great, however since nothing you do yields faster execution, that means you are 100% I/O bound.

The process of retrieving data from databases forces the hard disk to perform seeking. No matter how much you "chunk" that, if the disk is slow and if the data is scattered around the disk - that part will be slow and creating more workers that deal with parts of the data will just make the system slower and slower since each worker will have the same problem with retrieving the data.

I'd conclude that your issue lies in the slow hard disk, too big of a data set that might be improperly constructed for chunked retrieval. I suggest updating this question or creating another question that will help you retrieve data faster and in a more optimal way.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.