5

I have a products database that synchronizes with product data ever morning.

The process is very clear:

  • Get all products from database by query
  • Loop through all products, and get and xml from the other server by product_id
  • Update data from xml
  • Log the changes to file.

If I query a low amount of items, but limiting it to 500 random products for example, everything goes fine. But when I query all products, my script SOMETIMES goes on the fritz and starts looping multiple times. Hours later I still see my log file growing and products being added.

I checked everything I could think of, for example:

  • Are variables not used twice without overwriting each other
  • Does the function call itself
  • Does it happen with a low amount of products too: no.
  • The script is called using a cronjob, are the settings ok. (Yes)

The reason that makes it especially weird is that it sometimes goes right, and sometimes it doesnt. Could this be some memory problem?

EDIT wget -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync its in webmin called on a specific hour and minute

Code is hundreds of lines long...

Thanks

9
  • Is it possible that you same product multiple times? This might happen if your query is quite big with multiple joins which may multiple your result set? Commented Mar 27, 2015 at 15:26
  • 1
    Would you please show us your code? It might be a condition in your methods. Commented Mar 27, 2015 at 15:27
  • I think we're going to need to see some of your scripts - the cronjob and the script it runs to get much of a handle on this. Commented Mar 27, 2015 at 15:28
  • 1
    You could also post your cronjob code which might be called more frequently than you expect. Commented Mar 27, 2015 at 15:29
  • 1
    Hello @HansWassink, I assume that you are using some kind of framework (i guess its Zend). After reviewing your whole code, i guess i figured out one major point. Never try to run a SELECT query within a loop, this can be very deadly. If possible, its better to run that query only once and then do anything with that data. From my experience, i learnt a very useful technique, which is ARRAY. It is very powerful. I guess if you can do that, this will fix your problem and improve performance. Commented Mar 31, 2015 at 16:29

4 Answers 4

6
+50

You have:

  • max_execution_time disabled. Your script won't end until the process is complete for as long as it needed.
  • memory_limit disabled. There is no limit to how much data stored in memory.

500 records were completed without issues. This indicates that the scripts completes its process before the next cronjob iteration. For example, if your cron runs every hour, then the 500 records are processed in less than an hour.

If you have a cronjob that is going to process large amount of records, then consider adding lock mechanism to the process. Only allow the script to run once, and start again when the previous process is complete.

You can create script lock as part of a shell script before executing your php script. Or, if you don't have an access to your server you can use database lock within the php script, something like this.

class ProductCronJob
{
    protected $lockValue;

    public function run()
    {
        // Obtain a lock
        if ($this->obtainLock()) {
            // Run your script if you have valid lock
            $this->syncProducts();

            // Release the lock on complete
            $this->releaseLock();
        }
    }

    protected function syncProducts()
    {
        // your long running script
    }

    protected function obtainLock()
    {
        $time = new \DateTime;
        $timestamp = $time->getTimestamp();
        $this->lockValue = $timestamp . '_syncProducts';

        $db = JFactory::getDbo();

        $lock = [
            'lock'         => $this->lockValue,
            'timemodified' => $timestamp
        ];
        // lock = '0' indicate that the cronjob is not active.
        // Update #__cronlock set lock = '', timemodified = '' where name = 'syncProducts' and lock = '0'
//        $result = $db->updateObject('#__cronlock', $lock, 'id');

//        $lock = SELECT * FROM #__cronlock where name = 'syncProducts';

        if ($lock !== false && (string)$lock !== (string)$this->lockValue) {
            // Currently there is an active process - can't start a new one

            return false;

            // You can return false as above or add extra logic as below

            // Check the current lock age - how long its been running for
//            $diff = $timestamp - $lock['timemodified'];
//            if ($diff >= 25200) {
//                // The current script is active for 7 hours.
//                // You can change 25200 to any number of seconds you want.
//                // Here you can send notification email to site administrator.
//                // ...
//            }
        }

        return true;
    }

    protected function releaseLock()
    {
        // Update #__cronlock set lock = '0' where name = 'syncProducts'
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Hey there Satrun, thanks for your reply. I will try the Lock mechanism, but the thing is: I can see in the log that all the products are updated within 45 minutes, but the cron only goes once a day.
@Hans - please add your cron script to question. Maybe there's something wrong with it.
@HansWassink the lock mechanism should eliminate one possible reason for your issue. If after that the issue is still there, then the problem is with the logic of script for sure. Also make sure you remove the die() from end of your script before using the lock.
6

Your script is running for quite some time (~45m) and wget think it's "timing out" since you don't return any data. By default wget will have a 900s timeout value and a retry count of 20. So first you should probably change your wget command to prevent this:

wget --tries=0 --timeout=0 -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync

Now removing the timeout could lead to other issue, so instead you could send (and flush to force webserver to send it) data from your script to make sure wget doesn't think the script "timed out", something every 1000 loops or something like that. Think of this as a progress bar...

Just keep in mind that you will hit an issue when the run time will get close to your period as 2 crons will run in parallel. You should optimize your process and/or have a lock mechanism maybe?

Comments

4

I see two possibilities: - chron calls the script much more often - script takes too long somehow.

you can try estimate the time a single iteration of the loop takes. this can be done with time(). perhaps the result is suprising, perhaps not. you can probably get the number of results too. multiply the two, that way you will have an estimate of how long the process should take.

$productsToSync = $db->loadObjectList();

and

foreach ($productsToSync AS $product) {

it seems you load every result into an array. this wont work for huge databases because obviously a million rows wont fit in memory. you should just get one result at a time. with mysql there are methods that just fetch one thing at a time from the resource, i hope yours allows the same.

I also see you execute another query each iteration of the loop. this is something I try to avoid. perhaps you can move this to after the first query has ended and do all of those in one big query? otoh this may bite my first suggestion.

also if something goes wrong, try to be paranoid when debugging. measure as much as you can. time as much as you can when its a performance issue. put the timings in you log file. usually you will find the bottleneck.

Comments

4

I solved the problem myself. Thanks for all the replies!

My MySQL timed out, that was the problem. As soon as I added:

    ini_set('mysql.connect_timeout', 14400);
    ini_set('default_socket_timeout', 14400);

to my script the problem stopped. I really hope this helps someone. Ill upvote all the locking answers, because those were very helpful!

2 Comments

I know, it's weird, but I tested it 10 times. Without it it starts looping when I feed it more then 2500 products, with those rules everything goes smooth as a baby seal.
helped me a lot. :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.