1

I'm having trouble optimizing a query and could use some help. I'm currently pulling in events in a system that has to join several other tables to make sure the event is supposed to display, etc... The query was running smoothly (around 480ms) until I introduced another table in the mix. The query is as follows:

SELECT 
    keyword_terms, 
    `esf`.*, 
    `venue`.`name` AS venue_name, 
    ...
    `venue`.`zip`, ase.region_id, 
    (DATE(NOW()) BETWEEN...AND ase.region_id IS NULL) as featured, 
    getDistance(`venue`.`lat`, `venue`.`lng`, 36.073, -79.7903) as distance, 
    `network_exclusion`.`id` as net_exc_id
FROM (`event_search_flat` esf)
# Problematic part of query (pulling in the very next date for the event)
LEFT JOIN (
        SELECT event_id, MIN(TIMESTAMP(CONCAT(event_date.date, ' ', event_date.end_time))) AS next_date FROM event_date WHERE 
        event_date.date >= CURDATE() OR (event_date.date = CURDATE() AND TIME(event_date.end_time) >= TIME(NOW()))
        GROUP BY event_id
) edate ON edate.event_id=esf.object_id
# Pull in associated ad space
LEFT JOIN `ad_space` ads ON `ads`.`data_type`=`esf`.`data_type` AND ads.object_id=esf.object_id
# and make sure it is featured within region
LEFT JOIN `ad_space_exclusion` ase ON ase.ad_space_id=ads.id AND region_id =5
# Get venue details
LEFT JOIN `venue` ON `esf`.`venue_id`=`venue`.`id`
# Make sure this event should be listed
LEFT JOIN `network_exclusion` ON network_exclusion.data_type=esf.data_type  
                 AND network_exclusion.object_id=esf.object_id
                 AND network_exclusion.region_id=5
WHERE `esf`.`event_type` IN ('things to do') 
AND (`edate`.`next_date` >= '2013-07-18 16:23:53')
GROUP BY `esf`.`esf_id`
HAVING `net_exc_id` IS NULL
AND `distance` <= 40
ORDER BY DATE(edate.next_date) asc, 
`distance` asc
LIMIT 6

It seems that the issue lies with the event_date table, but I'm unsure how to optimize this query (I tried various views, indexes, etc... to no avail). I ran EXPLAIN and received the following: http://cl.ly/image/3r3u1o0n2A46 . enter image description here

At the moment, the query is taking 6.6 seconds. Any help would be greatly appreciated.

3
  • What all do you need from the event_date table? Only the min(timestamp portion of your query? Is that a distinct item per row, or can there be multiples? Commented Jul 18, 2013 at 20:45
  • Adding indexes on esf, ads, ase, and network_exclusion should help significantly. Commented Jul 18, 2013 at 20:48
  • event_date contains every date for each event. I only need the next_date (i.e. The MIN(date) that is greater than right now). Commented Jul 18, 2013 at 20:50

1 Answer 1

1
  • You may be able to get Using index on the event_date subquery by creating a compound index over (event_id, date, end_time). That may turn the subquery into an index-only query, which should speed it up slightly.

    The subquery might be better written as the following, without GROUP BY:

    SELECT event_id, TIMESTAMP(CONCAT(event_date.date, ' ', event_date.end_time))) AS next_date
    FROM event_date 
    WHERE event_date.date >= CURDATE() 
      OR (event_date.date = CURDATE() AND TIME(event_date.end_time) >= TIME(NOW()))
    ORDER BY next_date LIMIT 1
    
  • I'm more concerned that your EXPLAIN shows so many tables with type=ALL. That means it has to read every row from those tables and compare to them rows in other tables. You can get an idea of how much work it's doing by multiplying the values in the rows column. Basically, it's making billions of row comparisons to resolve the joins. As the tables grow, this query will get a lot worse.

  • Using LEFT [OUTER] JOIN has a specific purpose, and if you really mean to use INNER JOIN you should do that, because using an outer join where it doesn't belong can mess up the optimization. Use an outer join like A LEFT JOIN B only if you want rows in A that may not have matching rows in B.

    For example, I assume based on column naming convention that LEFT JOIN venue ON esf.venue_id=venue.id should be an inner join, because there should always be a venue referenced by esf.venue_id (unless esf.venue_id is sometimes null).

  • event_search_flat should have a compound index with columns used in the WHERE clause first, then columns to join to other tables: (event_type, object_id, data_type, event_id)

  • ad_space should have a compound index for the join: (data_type, object_id). Does this need to be an inner join too?

  • ad_space_exclusion should have a compound index for the join: (ad_space_id, region_id)

  • network_exclusion should have a compound index for the join: (data_type, object_id, region_id)

  • venue is okay because it's doing a primary key lookup already.

Sign up to request clarification or add additional context in comments.

2 Comments

Bill, just a quick read over your comments and I'm already encouraged. Your advice looks most excellent. Will implement your suggestions soon and heck, maybe even pick up a copy of your book.
Thanks for considering my book. So were these suggestions successful?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.