0

I have a very large database of images and i need to run an update to increment the view count on the images. every hour there are over one million unique rows to update. Right now it takes about an hour to run this query is there anyway to have this run faster?

i'm creating a memory table:

CREATE TABLE IF NOT EXISTS tmp_views_table (
    key VARCHAR(7) NOT NULL,
    views INT NOT NULL,
    primary key ( `key` )
) ENGINE = MEMORY

Then I insert 1000 views at a time using a loop that runs until all the views have been inserted into the memory table:

insert low_priority into tmp_views_table 
values ('key', 'count'),('key', 'count'),('key', 'count'), etc...

Then i run an update on the actual table like this:

update images, tmp_views_table 
set images.views = images.views+tmp_views_table.views 
where images.key = tmp_views_table.key

this last update is the one that is taking around an hour, the memory table stuff runs pretty quickly.

Is there a faster way that i can do this update?

8
  • 1
    A numeric ID would make more sense than a varchar. Also, is your table indexed? Commented Jan 5, 2012 at 18:37
  • I'm not sure where the indexes for memory tables are saved, but aren't they slowing these insert/update operations, especially in case of MEMORY engine? Commented Jan 5, 2012 at 18:58
  • @OliCharlesworth the id is a hash so it contains letters and numbers and yes the primary key is the key or hash Commented Jan 5, 2012 at 18:58
  • @Brian: I might be wrong, but my intuition thinks that indexing on an integer would be far faster than on a varchar. You should strongly consider revising your app to refer to images by numeric ID . Commented Jan 5, 2012 at 19:04
  • @Rolice so i shouldn't use indexes for the memory table? Commented Jan 5, 2012 at 19:06

1 Answer 1

1

Are you using Innodb, right? Try general tuning of mysql and innodb engine to allow for faster data changes.

I suppose you have an index on the key field of images table. You can try your update query also without index on the memory table - in that case the query optimizer should choose full table scan of the memory table.

I have never used joins with UPDATE statements, so I don't know exactly it is executed, but maybe the JOIN is taking too long. Maybe you can post an EXPLAIN result of that query.

Here is what I have used in one project to do the something similar - insert/update real-time data to temp table and merge it to aggregate table once a day, so can try if it will execute faster.

INSERT INTO st_views_agg (pageid,pagetype,day,count)
  SELECT pageid,pagetype,DATE(`when`) AS day, COUNT(*) AS count FROM st_views_pending WHERE (pagetype=4) GROUP BY pageid,pagetype,day
  ON DUPLICATE KEY UPDATE count=count+VALUES(count);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.