Mysql Count query optimization

Question

I have the below query to optimize.

SELECT count(*) AS count FROM area 
INNER JOIN entity ON area.id = entity.id
INNER JOIN areacust ON area.id = areacust.id 
WHERE entity.deleted=0 
AND area.id > 0

There are indexes on deleted, ids on all the tables.

Now when i have suppose 20 Lac (2 million) of records then the query takes lots of time to give me the result. Its between 10 to 20 seconds.

How can i optimize it more. Also is there any other technique to get count.

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  vtiger_crmentity    ref PRIMARY,entity_deleted_idx  entity_deleted_idx  4      const    729726  Using where; Using index
1   SIMPLE  area    eq_ref  PRIMARY PRIMARY 4   area.id 1   Using index
1   SIMPLE  areacust    eq_ref  PRIMARY PRIMARY 4   area.id 1   Using where; Using index

New explain for composite key

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  entity  ref PRIMARY,entity_deleted_idx,comp_index   deleted_idx 4   const   928304  Using index
1   SIMPLE  area    eq_ref  PRIMARY PRIMARY 4   entity.id   1   Using index
1   SIMPLE  areacust    eq_ref  PRIMARY PRIMARY 4   entity.idid 1   Using index

@Pramod theres a few things about that.. so first if you want to count everything most people will write it as COUNT(*).. if you want to count only things in one table you could do count(a.id) or even (a.*).. but that would only matter if there were LEFT JOINS that caused there to be additional rows... MySQL performs almost identically for COUNT(ID) or COUNT(*).. except when there are null values to be counted... if there are null values you should ALWAYS use COUNT(*) because MySQL knows how to handle it better that way.. cuts the execution time in half. — John Ruddell
– John Ruddell, Commented Aug 11, 2014 at 13:28

N.B. · Accepted Answer · 2014-08-11 14:28:53Z

1

As per comments, if you want to keep the query in question - you have to allocate more resources to your MySQL instance. I am assuming you use InnoDB for storage engine, otherwise this advice is useless:

Increase the value of innodb_buffer_pool variable. As much as you can. You want to allocate as much RAM as possible.

Also, get rid of index on deleted column, it's useless. Its cardinality is too low for it to be an index.

The other "technique" that you can (should) use is taking care of this count manually.

Create the table that holds the count number you are interested in. Every time you update / insert / delete the entity or area record - update the count value manually (increase, decrease).

That way all you have to do is look up a single record of a single table. Setting up triggers that will sort this out automatically should be trivial. That way you'll take care of the count at runtime instead waste I/O and CPU to constantly traverse the data set.

answered Aug 11, 2014 at 14:28

N.B.

14.1k3 gold badges47 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Arth Over a year ago

I like getting the query optimised with indexes as much as possible before increasing resources.. but that may be his only option here. Like the idea of a summary table, super super fast.

N.B. Over a year ago

Indexes can't help here. They are not a magic pill that makes everything fast just like that. It's a simple data structure and in this case having an index on deleted doesn't help at all. The original query as it is cannot be optimized further using indexes at all. The only available optimization would be dropping the index on deleted, which would just free up some storage space. Performance would be the same. The real answer is increasing the resources (default value of innodb_buffer_pool is 8 MB which is way too low) or creating a table to look up counter number.

Arth Over a year ago

Indexes absolutely can help here! The indexes described in my answer cover all the information required by the query so can satisfy the count(*) without the engine having to go into the row to check the entity deleted value.. which will save a lot of memory if entity has lots of columns. Increasing memory size is great but you can only do it once, and it is expensive to keep upgrading the server, so you want your queries as fast as possible. That's why I like the summary table.. solutions that are portable.

Arth Over a year ago

Also if you look at his explain it looks like the optimiser is using the index on deleted over the primary key for entity, which is cray! I'd be wanting to fix that first.

Arth · Accepted Answer · 2014-08-11 14:26:10Z

0

You could try:

SELECT count(*) AS count 
  FROM area 
  JOIN entity
    ON entity.id = area.id
   AND entity.deleted = 0 
  JOIN areacust 
    ON areacust.id = area.id

I like to include conditions in JOINs where possible and keep the table I'm JOINing on the left of the equals in these conditions.

Also the WHERE area.id > 0 was strange.. most foreign_keys start at 1 due to auto_increment ids in other tables so this will include all rows. I have deleted this condition.

From the look of your explain, you don't really want the top row to be using entity_deleted_idx. You may get more joy with a composite index on (id, deleted) for entity

These are the indexes i'd have for this query:

area - (id) This is probably the PRIMARY already
areacust - (id) This is probably the PRIMARY already
entity - (id, deleted) This should be added and used.

UPDATE

Remove all unsused indexes from the table entity except for the PRIMARY and the composite index.

If that doesn't work run:

SELECT count(*) AS count 
  FROM area 
  JOIN entity USE INDEX (**composite_index_name**)
    ON entity.id = area.id
   AND entity.deleted = 0 
  JOIN areacust 
    ON areacust.id = area.id

edited Aug 11, 2014 at 14:26

answered Aug 11, 2014 at 13:18

Arth

13.1k5 gold badges43 silver badges72 bronze badges

11 Comments

Rahul Tailwal Over a year ago

taking same amount of time even more.

Arth Over a year ago

@RahulTailwal Fair enough.. did you try the composite index on (id,deleted)?

Clockwork-Muse Over a year ago

Most optimizers will generate identical plans for this and the original query. These are semantically identical, so...

Arth Over a year ago

@Clockwork-Muse Yes that may be true, but it is way easier to see potential indexes and speed ups with this query, hence my concluding paragraphs.

N.B. Over a year ago

The reason why this takes time is because your database is HDD bound. It simply uses up hard disk I/O because it can't fit the working data set into memory. Culprit is always innodb_buffer_pool value, if you use InnoDB (which you should). Increase that number to at least 70% of the memory of your server (yes, 70%, you want to have as much data as possible in the memory because that's how you get speed).

|

Collectives™ on Stack Overflow

Mysql Count query optimization

2 Answers 2

4 Comments

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related