0

I was wondering if anybody would have some input on how I could possibly optimize this MySQL query. I think I'm doing the right thing with indexes so don't think I can get this query any faster (it's currently taking more than 3 seconds to run) but would really love somebody with more experience to prove me wrong.

Here's the query:

SELECT `rooms`.*,
       ((IFNULL(SUM(av.host_daily_price), 0) + 
         rooms.host_daily_price * (4 - COUNT(DISTINCT av.id))) / 4) / 1 as 'price', 
       rooms.*, 
       (ACOS(least(1, COS(0.7115121627883911) * COS(1.291278129536698) *
                      COS(RADIANS(rooms.lat)) * COS(RADIANS(rooms.lng)) +
                      COS(0.7115121627883911) * SIN(-1.291278129536698) *
                      COS(RADIANS(rooms.lat)) * SIN(RADIANS(rooms.lng)) +
                      SIN(0.7115121627883911) * SIN(RADIANS(rooms.lat)))) * 3963.19) AS distance 
FROM `rooms`
LEFT JOIN availabilities AS av 
  ON (av.room_id = rooms.id AND
      av.date BETWEEN '2010-12-29' AND '2011-01-01')    
WHERE (rooms.deleted_at IS NULL) AND
      (`rooms`.`hidden` = 0) AND
      (rooms.id <> 7713) AND
      (rooms.city_id = 1 AND
         rooms.max_guests >= 4 AND
         rooms.minimum_stay <= 3 AND
         rooms.room_type IN ('room','apartment','house')) AND
      (av.inquiry_id IS NULL) 
GROUP BY  rooms.id 
HAVING SUM(IFNULL(status, 0)) = 0 AND
       (COUNT(*) = 4 OR `rooms`.default_available = 1) 
ORDER BY distance ASC
LIMIT 12;

The output of Explain:

id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   SIMPLE  rooms   ref PRIMARY,index_rooms_on_city_id,index_rooms_on_room_type,index_rooms_on_city_id_and_updated_at   index_rooms_on_city_id  5   const   2412    Using where; Using temporary; Using filesort

1   SIMPLE  av  ref index_availabilities_on_room_id,index_availabilities_on_room_id_and_date,index_availabilities_on_room_id_and_date_and_status    index_availabilities_on_room_id 5   roomorama.rooms.id  79  Using where

Let me know if any other information would be useful!

1
  • Is that a typo? Why do you have rooms.* twice in the SELECT? Commented Oct 15, 2010 at 4:43

5 Answers 5

1

You should seriously consider storing the distance instead of calculating it, specially if you are sorting by that value.

Sign up to request clarification or add additional context in comments.

2 Comments

The distance is going to be different in every query as I'm searching for rooms that are close to a certain location (that changes all the time)
OK then use MySQL Spatial Extensions dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
1

How about:

1 - Only fetch all the data from rooms once. As @OMG Ponies noted in his comment, all columns from rooms are being fetched twice in the query.

2 - If the trig function being performed on constants in the query are truly constant (e.g. COS(0.7115121627883911)) replace them with the computed values, i.e.

COS(0.7115121627883911) = .7573753305446695179374104150422980521625
COS(1.291278129536698) = .2758925773610728508649620468976736490713
COS(0.7115121627883911) = .7573753305446695179374104150422980521625
SIN(-1.291278129536698) = -.9611884756680473394167554039882007538993

3 - This query appears to be doing the spherical trig to get the correct distance between two points on the surface of the earth. It might be faster and equally useful to compute an approximate distance by doing something like

dist = SQRT( (lat2-lat1)^2 + ((long2-long1) * COS(RADIANS(lat1+lat2)/2)))^2 ) * 60

This (should) give the distance between the points (lat1, long1) and (lat2, long2) in miles. Adjust the trailing constant for whatever distance measure you prefer.

Share and enjoy.

1 Comment

I assume the apparent constants are really query parameters relating to the location of the room-hunter. Even so, i would imagine the query optimiser will do the constant inlining itself. The idea of using a Cartesian approximation to simplify calculation seems like a good one; i'd go further, and base the normalisation factor entirely on the latitude that's a query parameter, ignoring the room's latitude, so that it can be computed once a the start of the query, rather than having to be evaluated per row. If we're talking about rooms in a city, the error will be small.
0

Without seeing how your tables are indexed, it's hard to tell if there are any specific problems. I don't know the MySQL explain output, so I'm not going to fake anything.

However, one thing you CAN do is create conditional indexes. For example, in addition to having your common indexes like

create index rooms_by_id on rooms(room_id);

you can have indexes that already have the conditions built in, thus reducing the number of index records that have to be traversed

create index rooms_by_id_usable on rooms(room_id)
    WHERE (deleted_at IS NOT NULL) and (hidden <> 0) 

If 20% of your rooms records match deleted_at IS NULL and hidden <> 0), then this alternate index rooms_by_id_usable will be 20% smaller than rooms_by_id, and will take (roughly) 20% less time to traverse.

This all depends on the MySQL optimizer and how it chooses to use indexes and so on. I know in PostgreSQL that this works very nicely indeed.

1 Comment

Sadly, i don't think MySQL supports partial indexes. They have something called a partial index, but it's different, and not useful here.
0

Is the performance significantly improved if the distance calculation is removed? If so, it might be worth storing the sin and cos of lat and lng on the room table (and using the stored values in the query) - these functions are relatively processor-intensive, so deriving them for all of a relatively large dataset may significantly impact performance.

Comments

0

Switch to PostgreSQL and use PostGIS. It's a geospatial extension to PostgreSQL which does this sort of thing natively and efficiently.

If you're genuinely stuck with MySQL, then no dice, but PostgreqSQL is free, open source, easy to use, and fast, so it's an entirely viable alternative to MySQL (to say the least) if you have the ability to choose.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.