0

I'm working with an application that has a MySQL database at Amazon RDS. The table in questions is set up as such:

CREATE TABLE `log` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `timestamp` datetime NOT NULL,
  `username` varchar(45) NOT NULL,
  .. snip some varchar and int fields ..
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This system has been in beta for a while and already the dataset is quite huge and the queries are starting to be rather slow.

SELECT COUNT(*) FROM log --> 16307224 (takes 105 seconds to complete)

This table is pretty much only used to build one report off a query like this

SELECT timestamp, username, [a few more] FROM log 
WHERE timestamp  BETWEEN '2012-03-30 08:00:00' AND '2012-03-30 16:00:00' 
AND username='XX' 

Which typically will give something between 1000 and 6000 rows taking around 100-180 sec to complete, meaning the web application will often time out and leave an empty report (I will look in to the timeout as well, but this question is for the root cause).

I'm not very good with databases, but my guess is that it's the BETWEEN that's killing me here. What I'm thinking is that I should perhaps somehow use the timestamp as index. Timestamp togethere with username should still provide uniqueness (I don't use the id field for anything).

If there's anyone out there with suggestions for optimizations I'm all ears.

UPDATE:

Table is now altered to the following

CREATE TABLE `log` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `timestamp` datetime NOT NULL,
  `username` varchar(45) NOT NULL,
  .. snip ..
  `task_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_un_ts` (`timestamp`,`username`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

EXPLAIN of the SELECT statement returns the following

id => 1
select_type => SIMPLE
table => log
type => range
possible_keys => index_un_ts
key => index_un_ts
key_len => 55
ref => 
rows => 52258
Extra => Using where; Using index
1
  • You can switch to MyISAM. Aggregate your data by cron daily, for example and store it in separate reporting table. Commented Apr 13, 2012 at 8:43

1 Answer 1

1

Well a index on the timestamp column and userid would be helpful. You need to be able to read the output of a EXPLAIN Statement.

Go to MySQL and do the Following:

EXPLAIN SELECT timestamp, username, [a few more] FROM log 
WHERE timestamp  BETWEEN '2012-03-30 08:00:00' AND '2012-03-30 16:00:00' 
AND username='XX' 

This show you the plan MySQL uses to execute the query. There will be column called key. This indicates what index MySQL is using in the query. I suspect you will see ALL there which means MySQL is scanning the table from top to bottom matching every row against your where clause. Now create a index on the timestamp and userid columns. Run the EXPLAIN statement again. You should see the index that you created in the key column.

If MySQL uses the index then your query should be considerably faster. Just remember not to over index. Indexes make inserts, updates and deletes slower. When you insert a new row into a table and there is three indexes on the table the new row has to write 3 values to the three different indexes. So it is a double edged sword.

Sign up to request clarification or add additional context in comments.

5 Comments

Oh just by the way the create index statement will run a while so do not panic if it just sits there. It has to scan the entires table get the values and then insert them into the new index structure. A index is a B-tree or R-tree stored on the disk and kept in synch with your table.
Alter your table with openark kit.
Explain does come back with ALL. Would you recommend creating two indexes, one for timestamp and one for username, or a compound index of the two?
Add them into one index. MySQL can only use one index for this search. You see when you find a value you want in the index MySQL still has to fetch the row from the table. The index contains the column value(s) and a pointer to the row. A index does not contain the actual row. So after finding the right value it still has to go fetch the row. Two indexes in this case will not work as it would index on timestamp first retrieve the row and then do a lookup in the user and then retrieve the row. This obviously cannot work. So if you had two indexes MySQL would choose the best one and use it.
Thank you for all your explanations. Please see the results of your changes in my original posts and see if I understood you correctly. Adding the indexes did take forever, so I ended up deleteing a lot of the data (as it wasn't production data anyway). Therefore I can't really compare the times, but from the 2040758 rows remaining the query above finishes in 0.05 seconds, so it seems to be working.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.