0

I have a remote vendor mysql database that contains retail item data. I created a php script to select this data in one data set, process it, and insert it into a local database over 3 tables.

My script works as intended, but by the end of the script, memory usage seems really high.

The longest part of the script is running the query to select the remote data. This query contains some joins that take about 190 seconds to run and retrieve around 100,000 rows.

The overhead for starting the script and fetching the remote data is about 35MB of RES memory. When the remote query finishes, the data is processed and inserted locally in around 10 seconds. During this 10 seconds, the memory usage of the script jumps from ~35MB to 300MB by the end.

This seems like a lot of memory usage for this simple task. It almost seems like the garbage collector isn't running.

Reading around about PHP's garbage collection, I have attempted to wrap parts of my code in functions. It was noted that this helps with garbage collection. It did not in my case.

I have attempted to run the garbage collection manually using gc_collect_cycles() but this has not made a difference (It returns 0 cycles each time I run it). I attempted to run it after each iteration of 5957 items.

I have attempted to unset() and set things to null during the end of iterations, but this doesn't seem to free the memory.

I installed the memprof extension to take a look at what was eating a lot of memory. It seems explode() and PDOStatement::fetch() are using the most by far. It seems the memory isn't getting freed each iteration. How would this be freed?

Note: In my script I split the local processing of items into groups of 5957, this is due to hitting limits with the binding of parameters. Each item gets 11 parameters bound (5957 * 11 = 65527; just under the limit of 65535).

Local environment:

Linux 4.4.0-17763-Microsoft #379-Microsoft x86_64 GNU/Linux (DEBIAN WSL)
PHP 7.0.33-0+deb9u3 (cli)
mysqlnd 5.0.12-dev - 20150407

Script:

<?php

ini_set('memory_limit', '-1');
set_time_limit(0);
$start = time();

// Step size for processing local inserts
$items_per_step = 5957;

// PDO options
$db_options = [
    PDO::ATTR_TIMEOUT => 10,
    PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
    PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC
];

// Queries
$fetch_remote_query = file_get_contents(__DIR__ . '/sql/fetch_remote.sql');
$item_query = file_get_contents(__DIR__ . '/sql/add_local.sql');
$about_query = file_get_contents(__DIR__ . '/sql/add_about.sql');
$filters_query = file_get_contents(__DIR__ . '/sql/add_filters.sql');

try {

    // Connect to databases
    $remotedw = new PDO('dsn', 'user', 'pass', $db_options);
    $localdw = new PDO('dsn', 'user', 'pass', $db_options);

    // Fetch remote
    echo 'Fetching items from the Remote database...' . PHP_EOL;;
    $items = $remotedw->query($fetch_remote_query);
    $item_count = $items->rowCount();
    echo "$item_count items fetched and ready for caching" . PHP_EOL;;

    // Calculate steps
    $steps_required = ceil($item_count / $items_per_step);
    echo "Processing items in $steps_required steps" . PHP_EOL;;

    // Run steps
    for ($steps_taken = 1, $offset = 0; $steps_taken <= $steps_required; $steps_taken++, $offset += $items_per_step) {

        // Step length
        $length = $steps_taken * $items_per_step > $item_count ? $item_count - $offset : $items_per_step;

        // Initial local query parts for the current step
        $item_rows = '';
        $about_rows = '';
        $filter_rows = '';
        $item_data = [];
        $about_data = [];
        $filter_data = [];

        // Step through items
        for($i = 0; $i < $length; $i++) {

            // Fetch next row
            $item = $items->fetch();

            // Build items
            $item_rows .= '(?,?,?,?,?,?,?,?,?,?,?),';
            $item_data[] = $item['sku'];
            $item_data[] = $item['mfg_number'];
            $item_data[] = $item['handling'];
            $item_data[] = $item['taxable'];
            $item_data[] = $item['price'];
            $item_data[] = $item['qty_available'];
            $item_data[] = $item['department'];
            $item_data[] = $item['class'];
            $item_data[] = $item['description'];
            $item_data[] = $item['sales_to_date'];
            $item_data[] = $item['show_on_web'];

            // Build about
            foreach (explode('*', $item['about']) as $about_entry) {
                if ($about_entry === '') continue;
                $about_rows .= '(?,?),';
                $about_data[] = $item['sku'];
                $about_data[] = $about_entry;
            }

            // Build filters
            if ($item['fineline']) {
                $filter_rows .= '(?,?),';
                $filter_data[] = $item['sku'];
                $filter_data[] = $item['fineline'];
            }

        }

        // Add items
        $localdw
            ->prepare(str_replace('{{rows}}', rtrim($item_rows, ','), $item_query))
            ->execute($item_data);

        // Add about (sometimes items do not have about data, so check if there are rows)
        if ($about_rows) $localdw
            ->prepare(str_replace('{{rows}}', rtrim($about_rows, ','), $about_query))
            ->execute($about_data);

        // Add filters (sometimes items do not have filter data, so check if there are rows)
        if ($filter_rows) $localdw
            ->prepare(str_replace('{{rows}}', rtrim($filter_rows, ','), $filters_query))
            ->execute($filter_data);

    }

} catch (PDOException $exception) {
    echo $exception->getMessage() . PHP_EOL;
}

echo 'Script finished in ' . (time() - $start) . ' seconds' . PHP_EOL;

1 Answer 1

1

I think rowCount() may be causing it to buffer all the results, as if you'd called $items->fetchAll().

Instead of using a for loop, use a while loop that gathers the results and performs the batched queries when the iteration variable is a multiple of the step size.

$i = 0;
$item_rows = '';
$about_rows = '';
$filter_rows = '';
$item_data = [];
$about_data = [];
$filter_data = [];

while ($item = $items->fetch()) {
    $item_rows .= '(?,?,?,?,?,?,?,?,?,?,?),';
    $item_data[] = $item['sku'];
    $item_data[] = $item['mfg_number'];
    $item_data[] = $item['handling'];
    $item_data[] = $item['taxable'];
    $item_data[] = $item['price'];
    $item_data[] = $item['qty_available'];
    $item_data[] = $item['department'];
    $item_data[] = $item['class'];
    $item_data[] = $item['description'];
    $item_data[] = $item['sales_to_date'];
    $item_data[] = $item['show_on_web'];

    // Build about
    foreach (explode('*', $item['about']) as $about_entry) {
        if ($about_entry === '') continue;
        $about_rows .= '(?,?),';
        $about_data[] = $item['sku'];
        $about_data[] = $about_entry;
    }

    // Build filters
    if ($item['fineline']) {
        $filter_rows .= '(?,?),';
        $filter_data[] = $item['sku'];
        $filter_data[] = $item['fineline'];
    }

    if (++$i == $items_per_step) {
        $localdw
            ->prepare(str_replace('{{rows}}', rtrim($item_rows, ','), $item_query))
            ->execute($item_data);

        // Add about (sometimes items do not have about data, so check if there are rows)
        if ($about_rows) $localdw
            ->prepare(str_replace('{{rows}}', rtrim($about_rows, ','), $about_query))
            ->execute($about_data);

        // Add filters (sometimes items do not have filter data, so check if there are rows)
        if ($filter_rows) $localdw
            ->prepare(str_replace('{{rows}}', rtrim($filter_rows, ','), $filters_query))
            ->execute($filter_data);

        $i = 0;
        $item_rows = '';
        $about_rows = '';
        $filter_rows = '';
        $item_data = [];
        $about_data = [];
        $filter_data = [];
    }
}
if ($i > 0) {
    // process the last batch
    $localdw
        ->prepare(str_replace('{{rows}}', rtrim($item_rows, ','), $item_query))
        ->execute($item_data);

    // Add about (sometimes items do not have about data, so check if there are rows)
    if ($about_rows) $localdw
        ->prepare(str_replace('{{rows}}', rtrim($about_rows, ','), $about_query))
        ->execute($about_data);

    // Add filters (sometimes items do not have filter data, so check if there are rows)
    if ($filter_rows) $localdw
        ->prepare(str_replace('{{rows}}', rtrim($filter_rows, ','), $filters_query))
        ->execute($filter_data);
}
Sign up to request clarification or add additional context in comments.

1 Comment

I did some more research on this. It seems that the second the query finishes running it buffers the whole result. I set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false and it seems to help by only creeping up slowly over the life of the program (rather than all at once). Your method is a lot cleaner though. I implemented it, but there still is a lot of memory creep. It almost seems like every time fetch() is run it never frees the fetched result.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.