1

I have 2 tables.

books (id, sku, name, description, date_added)

and

books_views (id, sku, date_viewed)

I am trying to write an optimized query to do the following.

  1. To find the most viewed books for the last week
  2. To find the most viewed books for the last month

The books_views table has more than 4 million entries. What would be the best way to get the data sorted by views for week and month?

4
  • Where is the number of views stored? What have you tried? Commented Mar 27, 2012 at 19:09
  • @jordanm: Presumably, there is a new row added to books_views for each view. Commented Mar 27, 2012 at 19:10
  • a new_row is added to book_views. I tried creating a table every night books_weekly_views (sku, count) and then using that, but don't think that is efficient. Commented Mar 27, 2012 at 19:12
  • Is one of the fields a foreign key? Commented Mar 27, 2012 at 19:14

2 Answers 2

2

The query is:

SELECT sku, count(*) AS times_viewed
FROM book_views bv
WHERE date_viewed > DATE_SUB(NOW(), INTERVAL 7 DAY) 
GROUP BY sku
ORDER BY times_viewed DESC

To get the views for the month, change the interval to 30 days.

To make it fast, you need to make sure that the table is indexed properly. You'll want an index on date_viewed for sure. If you're going to want the book names as well, you'll want to index the sku columns in both tables. Here's how you'd get the book names as well.

SELECT bv.sku, name, count(*) AS times_viewed
FROM book_views bv JOIN books b ON bv.sku = b.sku
WHERE date_viewed > DATE_SUB(NOW(), INTERVAL 7 DAY) 
GROUP BY bv.sku
ORDER BY times_viewed DESC
Sign up to request clarification or add additional context in comments.

5 Comments

How can you have an index on times_viewed? It's an alias.
Works nicely but take about 5.3 seconds. Is that ok?
That's not surprisingly slow. Chances are you'll want to do nightly rollups and just query the table with those to keep it fast.
Yes I guess i would just create book_views_monthly, book_views_weekly and use cron to generate those tables every night and just use those. Doing that gets extremely fast.
Is date_viewed a DATE or DATETIME field?
0

Unless your books_views table has other fields that you are not showing here you should change your views table to - books_views (sku, date_viewed, views) with the PK on sku and date_viewed.

Then modify your insert to be an insert on dup key -

INSERT INTO books_views VALUES ('sku', CURRENT_DATE, 1)
    ON DUPLICATE KEY UPDATE views = views + 1;

If you want the best performance, assuming more updates than inserts you could do -

UPDATE books_views
SET views = views + 1
WHERE sku = 'sku'
AND date_viewed = CURRENT_DATE;

then check for the number of affected rows, and then do insert if no rows affected -

INSERT INTO books_views VALUES ('sku', CURRENT_DATE, 1);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.