1

I have the below query to optimize.

SELECT count(*) AS count FROM area 
INNER JOIN entity ON area.id = entity.id
INNER JOIN areacust ON area.id = areacust.id 
WHERE entity.deleted=0 
AND area.id > 0

There are indexes on deleted, ids on all the tables.

Now when i have suppose 20 Lac (2 million) of records then the query takes lots of time to give me the result. Its between 10 to 20 seconds.

How can i optimize it more. Also is there any other technique to get count.

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  vtiger_crmentity    ref PRIMARY,entity_deleted_idx  entity_deleted_idx  4      const    729726  Using where; Using index
1   SIMPLE  area    eq_ref  PRIMARY PRIMARY 4   area.id 1   Using index
1   SIMPLE  areacust    eq_ref  PRIMARY PRIMARY 4   area.id 1   Using where; Using index

New explain for composite key

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  entity  ref PRIMARY,entity_deleted_idx,comp_index   deleted_idx 4   const   928304  Using index
1   SIMPLE  area    eq_ref  PRIMARY PRIMARY 4   entity.id   1   Using index
1   SIMPLE  areacust    eq_ref  PRIMARY PRIMARY 4   entity.idid 1   Using index
17
  • 1
    Post EXPLAIN plan of your query Commented Aug 11, 2014 at 13:12
  • have you tried adding FORCE INDEX ..... to the select? Commented Aug 11, 2014 at 13:17
  • how to adde force index in this query Commented Aug 11, 2014 at 13:18
  • FROM area FORCE INDEX (area.id) Commented Aug 11, 2014 at 13:19
  • 1
    @Pramod theres a few things about that.. so first if you want to count everything most people will write it as COUNT(*).. if you want to count only things in one table you could do count(a.id) or even (a.*).. but that would only matter if there were LEFT JOINS that caused there to be additional rows... MySQL performs almost identically for COUNT(ID) or COUNT(*).. except when there are null values to be counted... if there are null values you should ALWAYS use COUNT(*) because MySQL knows how to handle it better that way.. cuts the execution time in half. Commented Aug 11, 2014 at 13:28

2 Answers 2

1

As per comments, if you want to keep the query in question - you have to allocate more resources to your MySQL instance. I am assuming you use InnoDB for storage engine, otherwise this advice is useless:

Increase the value of innodb_buffer_pool variable. As much as you can. You want to allocate as much RAM as possible.

Also, get rid of index on deleted column, it's useless. Its cardinality is too low for it to be an index.

The other "technique" that you can (should) use is taking care of this count manually.

Create the table that holds the count number you are interested in. Every time you update / insert / delete the entity or area record - update the count value manually (increase, decrease).

That way all you have to do is look up a single record of a single table. Setting up triggers that will sort this out automatically should be trivial. That way you'll take care of the count at runtime instead waste I/O and CPU to constantly traverse the data set.

Sign up to request clarification or add additional context in comments.

4 Comments

I like getting the query optimised with indexes as much as possible before increasing resources.. but that may be his only option here. Like the idea of a summary table, super super fast.
Indexes can't help here. They are not a magic pill that makes everything fast just like that. It's a simple data structure and in this case having an index on deleted doesn't help at all. The original query as it is cannot be optimized further using indexes at all. The only available optimization would be dropping the index on deleted, which would just free up some storage space. Performance would be the same. The real answer is increasing the resources (default value of innodb_buffer_pool is 8 MB which is way too low) or creating a table to look up counter number.
Indexes absolutely can help here! The indexes described in my answer cover all the information required by the query so can satisfy the count(*) without the engine having to go into the row to check the entity deleted value.. which will save a lot of memory if entity has lots of columns. Increasing memory size is great but you can only do it once, and it is expensive to keep upgrading the server, so you want your queries as fast as possible. That's why I like the summary table.. solutions that are portable.
Also if you look at his explain it looks like the optimiser is using the index on deleted over the primary key for entity, which is cray! I'd be wanting to fix that first.
0

You could try:

SELECT count(*) AS count 
  FROM area 
  JOIN entity
    ON entity.id = area.id
   AND entity.deleted = 0 
  JOIN areacust 
    ON areacust.id = area.id 

I like to include conditions in JOINs where possible and keep the table I'm JOINing on the left of the equals in these conditions.

Also the WHERE area.id > 0 was strange.. most foreign_keys start at 1 due to auto_increment ids in other tables so this will include all rows. I have deleted this condition.

From the look of your explain, you don't really want the top row to be using entity_deleted_idx. You may get more joy with a composite index on (id, deleted) for entity

These are the indexes i'd have for this query:

  • area - (id) This is probably the PRIMARY already
  • areacust - (id) This is probably the PRIMARY already
  • entity - (id, deleted) This should be added and used.

UPDATE

Remove all unsused indexes from the table entity except for the PRIMARY and the composite index.

If that doesn't work run:

SELECT count(*) AS count 
  FROM area 
  JOIN entity USE INDEX (**composite_index_name**)
    ON entity.id = area.id
   AND entity.deleted = 0 
  JOIN areacust 
    ON areacust.id = area.id 

11 Comments

taking same amount of time even more.
@RahulTailwal Fair enough.. did you try the composite index on (id,deleted)?
Most optimizers will generate identical plans for this and the original query. These are semantically identical, so...
@Clockwork-Muse Yes that may be true, but it is way easier to see potential indexes and speed ups with this query, hence my concluding paragraphs.
The reason why this takes time is because your database is HDD bound. It simply uses up hard disk I/O because it can't fit the working data set into memory. Culprit is always innodb_buffer_pool value, if you use InnoDB (which you should). Increase that number to at least 70% of the memory of your server (yes, 70%, you want to have as much data as possible in the memory because that's how you get speed).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.