0

I have got a SQL query that I tried to optimize and I could reduce through various means the time from over 5 seconds to about 1.3 seconds, but no further. I was wondering if anyone would be able to suggest further improvements.

The Explain diagram shows a full scan: explain diagram

The Explain table will give you more details: explain tabular

The query is simplified and shown below - just for reference, I'm using MySQL 5.6

select * from (
  select 
    @row_num := if(@yacht_id = yacht_id and @charter_type = charter_type and @start_base_id = start_base_id and @end_base_id = end_base_id, @row_num +1, 1) as row_number,
    @yacht_id := yacht_id as yacht_id, 
    @charter_type := charter_type as charter_type,
    @start_base_id := start_base_id as start_base_id,
    @end_base_id := end_base_id as end_base_id,
    model, offer_type, instant, rating, reviews, loa, berths, cabins, currency, list_price, list_price_per_day, 
    discount, client_price, client_price_per_day, days, date_from, date_to, start_base_city, end_base_city, start_base_country, end_base_country, 
    service_binary, product_id, ext_yacht_id, main_image_url
  from (
    select
      offer.yacht_id, offer.charter_type, yacht.model, offer.offer_type, offer.instant, yacht.rating, yacht.reviews, yacht.loa, 
      yacht.berths, yacht.cabins, offer.currency, offer.list_price, offer.list_price_per_day, 
      offer.discount, offer.client_price, offer.client_price_per_day, offer.days, date_from, date_to,
      offer.start_base_city, offer.end_base_city, offer.start_base_country, offer.end_base_country,
      offer.service_binary, offer.product_id, offer.start_base_id, offer.end_base_id,
      yacht.ext_yacht_id, yacht.main_image_url
    from website_offer as offer
    join website_yacht as yacht
      on offer.yacht_id = yacht.yacht_id, 
    (select @yacht_id:='') as init
    where date_from > CURDATE() 
      and date_to <= CURDATE() + INTERVAL 3 MONTH
      and days = 7
    order by offer.yacht_id, charter_type, start_base_id, end_base_id, list_price_per_day asc, discount desc
  ) as filtered_offers
) as offers
where row_number=1;

Thanks, goppi

UPDATE

I had to abandon some performance improvements and replaced the original select with the new one. The select query is actually dynamically built by the backend based on which filter criteria are set. As such the where clause of the most inner select can expland quite a lot. However, this is the default select if no filter is set and is the version that takes significantly longer than 1 sec.

explain in text form - doesn't come out pretty as I couldn't figure out how to format a table, but here it is:

1 PRIMARY ref <auto_key0> <auto_key0> 9 const 10
2 DERIVED ALL 385967
3 DERIVED system 1 Using filesort 3 DERIVED offer ref idx_yachtid,idx_search,idx_dates idx_dates 5 const 385967 Using index condition; Using where 3 DERIVED yacht eq_ref PRIMARY,id_UNIQUE PRIMARY 4 yachtcharter.offer.yacht_id 1
4 DERIVED No tables used

11
  • Which version of MySQL are you running? Commented Sep 25, 2020 at 20:57
  • please show the output of show create table website_offer and show create table website_yacht and explain select ....rest of your query as text, not an image Commented Sep 25, 2020 at 21:11
  • how long does the innermost select (select * from website_offer ... limit 5000) take? what is the reason for the limit 5000? Commented Sep 25, 2020 at 21:16
  • By the way, avoid using variables in MySQL. They are obsolete in MySQL 8.x. Commented Sep 25, 2020 at 22:54
  • Definitely need to see the table defs as text as @ysth suggested. One thing that sticks out is that you are defeating any indexes that might exist on date_from or date_to in your website_offer subquery by applying a function (date) on those columns. You should rewrite both of those in a way that avoids that if possible. The right approach depends on the type of those columns. An answer below steps in that direction, but you should do it for date_to as well. Commented Sep 25, 2020 at 23:28

2 Answers 2

1

Sub selects are never great, You should sign up here: https://www.eversql.com/

Run that and it will give you all the right indexes and optimsiations you need for this query.

Sign up to request clarification or add additional context in comments.

2 Comments

sometimes subselects are the best thing to do, sometimes they are awful, sometimes they are just equivalent to a join. there is no general rule.
@MrPHP - eversql didn't perform well in this case. It suggested to add an index in website_offers for (days,yacht_id,date_from) which did get used instead of idx_dates, however with the consequence that nokey was used for website_yachts and the performance dropped from 3.3secs to 4.5 secs.
0

There's still some optimization you can use. Considering the subquery returns 5000 rows only you could use an index for it.

First rephrase the predicate as:

select *
from website_offer
where date_from >= CURDATE() + INTERVAL 1 DAY -- rephrased here
  and date(date_to) <= CURDATE() + INTERVAL 3 MONTH
  and days = 7
order by yacht_id, charter_type, list_price_per_day asc, discount desc
limit 5000    

Then, if you add the following index the performance could improve:

create index ix1 on website_offer (days, date_from, date_to);

6 Comments

You should rephrase the date_to predicate as well if you want to take full advantage of the new index I think.
and presumably date_from is always on or before date_to, so the date_to condition could also be used for date_from and vice versa
@totalhack - Yes, rephrase date_to also. But it won't help unless you also add INDEX(days, date_to, date_from). That will give the Optimizer two approaches to try -- it may find that testing date_to will be faster. The Optimizer will not use both date_from and date_to except via "ICP".
@The Impaler - taking out the date() function didn't make any difference to the performance. It didn't seem to make a difference to the use of the index either. The index idx_dates includes date_from, date_to and days and was used with the date() function and without.
@Rick James that's interesting, I assumed it could have taken advantage of the full (days, date_from, date_to) index with the predicates fixed. Is there a particular reason the optimizer wouldn't be able to use the full multi part index? Any docs you can point me to explaining that? Thanks.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.