0

I have a query right now that works well but will have scaling problems. The solution I have found is wildly slow. I'm looking to speed up the second query.

Old query that won't scale well:

SELECT user.score
FROM users
WHERE
  user.id IN (
    SELECT user_id 
    FROM companies_users 
    ON companies_users.company_id = X
)

Then I would iterate across the different scores to group them. Scores range from -10 to 10. The problem comes from the IN SELECT statement and the iteration. There could be over a million user_ids returned.

The alternative I've come up with should scale better but is wildly slow:

SELECT 
  COUNT(*) as total_scores,
  (SELECT COUNT(*) FROM users 
    JOIN companies_users as cu ON cu.company_id = cu.user_id
    WHERE users.score = 10 AND cu.company_id = X) as "10",
  (SELECT COUNT(*) FROM users 
    JOIN companies_users as cu ON cu.company_id = cu.user_id
    WHERE users.score = 9 AND cu.company_id = X) as "9",
...
  (SELECT COUNT(*) FROM users 
    JOIN companies_users as cu ON cu.company_id = cu.user_id
    WHERE users.score = -9 AND cu.company_id = X) as "-9",
  (SELECT COUNT(*) FROM users 
    JOIN companies_users as cu ON cu.company_id = cu.user_id
    WHERE users.score = -10 AND cu.company_id = X) as "-10"
FROM users
  JOIN companies_users as cu ON cu.company_id = cu.user_id
  WHERE cu.company_id = X

The first query requires iteration to get into working data. The second is good to go.

Is there a way to pull the JOIN out of the nested SELECTs? That seems to be causing the majority of the slowdown in the second query. Also, am I right that the first query won't scale well when dealing with millions of ids?

1 Answer 1

1

What would be the problem with:

SELECT u.score
FROM companies_users cu
    JOIN users u ON cu.user_id = u.id
WHERE cu.company_id=?
GROUP BY u.score
ORDER BY u.score

?

Also, do you have appropriate indices? You need an index on companies_users(company_id), and one on users(id). You may try adding one on companies_users(user_id) just in case the planner decides it's better to do the query the the other way around. EXPLAIN and EXPLAIN ANALYZE are your friends.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the reply! That is pretty close to perfect. I'm actually looking for the counts on the different scores. I used your solution but changed the select portion to u.score, count(u.score) and have got all the data! Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.