I have a query right now that works well but will have scaling problems. The solution I have found is wildly slow. I'm looking to speed up the second query.
Old query that won't scale well:
SELECT user.score
FROM users
WHERE
user.id IN (
SELECT user_id
FROM companies_users
ON companies_users.company_id = X
)
Then I would iterate across the different scores to group them. Scores range from -10 to 10. The problem comes from the IN SELECT statement and the iteration. There could be over a million user_ids returned.
The alternative I've come up with should scale better but is wildly slow:
SELECT
COUNT(*) as total_scores,
(SELECT COUNT(*) FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE users.score = 10 AND cu.company_id = X) as "10",
(SELECT COUNT(*) FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE users.score = 9 AND cu.company_id = X) as "9",
...
(SELECT COUNT(*) FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE users.score = -9 AND cu.company_id = X) as "-9",
(SELECT COUNT(*) FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE users.score = -10 AND cu.company_id = X) as "-10"
FROM users
JOIN companies_users as cu ON cu.company_id = cu.user_id
WHERE cu.company_id = X
The first query requires iteration to get into working data. The second is good to go.
Is there a way to pull the JOIN out of the nested SELECTs? That seems to be causing the majority of the slowdown in the second query. Also, am I right that the first query won't scale well when dealing with millions of ids?