0

I am dynamically building a SELECT statement that fetches an item and potentially several lists of related items.

The ultimate goal is an object in application space with arrays of ids for each of the related types.

Using a list of JOINs is pretty straightforward:

SELECT items.*, item_has_related1.related1_id, item_has_related2.related2_id, ...
FROM (items)
LEFT JOIN item_has_related1 ON item_has_related1.item_id = items.id
LEFT JOIN item_has_related2 ON item_has_related2.item_id = items.id
... potentially many more
WHERE items.id = $itemId;

LEFT JOIN is used because some relationships might be empty.

The most obvious problem with this is that the number of rows that are returned is the product of the number matches in all the joins. With just a few joined tables that number could get very large. If there were five tables with six matches each, there would be 6^5 rows! A secondary problem is that processing the return rows is more complex, as I have to dig out the unique values in each column.

As an alternative, I have written something like this, which essentially does a separate query for each JOIN:

SELECT items.*, item_has_related_1.related1_id, NULL as related2_id, ...
FROM (items)
JOIN item_has_related_1 ON item_has_related_1.item_id = items.id
WHERE items.id = $itemId

UNION

SELECT items.*, NULL as related1_id, item_has_related_2.related2_id, ...
FROM (items)
JOIN item_has_related_2 ON item_has_related_2.item_id = items.id
WHERE items.id = $itemId

The number of rows returned this way is the sum of the number of matches in all joins. However, query prep time is much longer and so for smaller datasets this method is less efficient. I have tried to empirically determine the definition of "smaller", but with my test data I'm not sure if my results are meaningful.

Is there a more efficient way to perform multiple JOINs and combine the results, or is there another approach to this problem?

EDITED TO ADD: Barmar has the right answer to my question, but my very next step was expanding the where clause to return multiple rows. Referring to this question, my code ended up looking like this:

SELECT items.*,
(SELECT GROUP_CONCAT(related1_id) FROM item_has_related_1 WHERE item_id = items.id) as related1Ids,
(SELECT GROUP_CONCAT(related2_id) FROM item_has_related_2 WHERE item_id = items.id) as related2Ids,
...
FROM items
WHERE <where criteria>
4
  • It seems to me that you should not use left join, but inner join instead. Can you elaborate to why are you using it? Commented Jul 5, 2018 at 17:59
  • You are right, inner join works in this case, but the result set still grows exponentially. I'll edit my question. Commented Jul 5, 2018 at 18:03
  • I am not really sure it makes much sense to combine the joins into one query in the first place. I think what you are doing with the results matters here. Since the "combined joins" vs "separate join queries" approaches yield quite different resultsets, you would have to either be collapsing or expanding the results on the client side to use them in the same manner. Commented Jul 5, 2018 at 18:13
  • Thanks for the feedback. I added a bit of clarification to the question; ultimately I will reduce the results of each join to a discrete list. Commented Jul 5, 2018 at 18:24

2 Answers 2

1

You can use GROUP_CONCAT to get all the related items from each table into a comma-separated list in the result.

SELECT items.*, related1_ids, related2_ids, ...
FROM items
LEFT JOIN (
    SELECT item_id, GROUP_CONCAT(related1_id) AS related1_ids
    FROM item_has_related_1
    WHERE item_id = $itemId
) AS r1 ON items.id = r1.item_id
LEFT JOIN (
    SELECT item_id, GROUP_CONCAT(related2_id) AS related2_ids
    FROM item_has_related_2
    WHERE item_id = $itemId
) AS r2 ON items.id = r2.item_id
...

Later you can split them up in the application language.

Sign up to request clarification or add additional context in comments.

4 Comments

Not surprisingly, this has about the same performance numbers as the UNION approach, but the result is definitely easy to work with, and it should be easier to generate this query in code. I did make two optimizations - I used INNER JOIN and since items.id is a given, I substituted the value in the ON conditions. In php: ... ON r1.item_id = $itemId. It hadn't occurred to me to do that before; I'm not sure if that sped anything up, but it sure won't slow it down.
If you use INNER JOIN and the item is entirely missing in one of the related tables, you won't get any result.
The code already filters it down to r1.item_id = $itemId in the subquery. It shouldn't really matter which you use in the ON clause, they're equivalent.
Got it. Good point. Note to self: add tests with empty relationships.
0

You can simply write the query with inner joins like this:

SELECT items.*, item_has_related1.related1_id, item_has_related2.related2_id, ...
FROM (items)
INNER JOIN item_has_related1 ON item_has_related1.item_id = items.id
INNER JOIN item_has_related2 ON item_has_related2.item_id = items.id
... potentially many more
WHERE items.id = $itemId;

This query will have as many rows as there are matches of $itemId in the other tables.

The thing is, if you will ever need all those data listed in the select statement, you will have to do the work of joining all queries, even if they are separate, which will not gain anything w.r.t the approach of doing all joins together as listed here.

3 Comments

Thanks for the answer. I just modified my query to use INNER JOIN, but I still get an exponentially growing result set.
@Jerry Check my added comment in the answer. There is no way around this. I would precise that it is not exponential, but cartesian product.
I'm not sure I understand your comment about not gaining anything - with all the JOIN statements combined as you listed, I could easily get 10,000 rows to sort through, where using the UNION approach would give me 40 rows, each with one interesting piece of information. Perhaps I am operating under a misconception about how MySQL optimizes that makes doing all the joins at once efficient enough to offset the huge result set?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.