How to put nested query in a JOIN?

Question

Let's say that I have the following two tables:

TABLE1

+-------+-------+-------+
| data1 | data2 | data3 |
+-------+-------+-------+
|     1 |    12 |    13 |
|     2 |    22 |    23 |
|     3 |    32 |    33 |
+-------+-------+-------+

TABLE2

+-------+-------+-------+
| data1 | data4 | data5 |
+-------+-------+-------+
|     1 |  NULL |   015 |
|     1 |    14 |   115 |
|     1 |    14 |   115 |
|     2 |  NULL |   025 |
|     2 |    24 |   125 |
|     2 |    24 |   125 |
|     3 |  NULL |   035 |
|     3 |    34 |   135 |
|     3 |    34 |   135 |
+-------+-------+-------+

And I have the following query:

SELECT TABLE1.data1,
       TABLE1.data2,
       TABLE1.data3,
       (SELECT TOP 1
               data4
        FROM TABLE2
        WHERE data1 = TABLE1.data1
          AND data4 IS NOT NULL),
       (SELECT TOP 1
               data5
        FROM TABLE2
        WHERE data1 = TABLE1.data1
          AND data4 IS NOT NULL)
FROM TABLE1;

QUERY RESULT

+-------+-------+-------+-------+-------+
| data1 | data2 | data3 | data4 | data5 |
+-------+-------+-------+-------+-------+
|     1 |    12 |    13 |    14 |   115 |
|     2 |    22 |    23 |    24 |   125 |
|     3 |    32 |    33 |    34 |   135 |
+-------+-------+-------+-------+-------+

Assuming the TABLE2 meets these two conditions:

Foreach data1, data4 can either be 1 or have the same value in every row.
Foreach data1, data5 will have one value for each row with data4 null and another for each row with data4 not null.

Is there a way to rewrite the query in such a way that I don't have a nested query in the select part? Maybe using JOIN statements? I'm asking because I've realized that the performance of the nested query in the SELECT is quite poor. However, if I try with a JOIN I end up duplicating the rows that have data4 different than null.

Whitespace and Linebreaks are paramount to making readable text; not just in code. Please get into the habit of making good use of both. Poor/bad formatting is not going to help you or others when you need to be able to quickly read and understand your code. It helps easily distinguish specific code blocks, and sections, and also makes finding errors far easier when a line only contains 10's of characters, rather than 100's. — Thom A
– Thom A ♦, Commented Dec 9, 2021 at 12:49
Note that a TOP without an ORDER BY is a sure sign of a flaw. This means that the data engine is free to return what ever arbitrary value it wants, and that value could be different every time you run said query. If you are using TOP you need to ensure the query has an ORDER BY so that you get consistent and reliable results. — Thom A
– Thom A ♦, Commented Dec 9, 2021 at 12:52
You can either join with row_number or to do a lateral query. — The Impaler
– The Impaler, Commented Dec 9, 2021 at 13:01
As @Larnu says, TOP without ORDER BY rarely makes sense. — The Impaler
– The Impaler, Commented Dec 9, 2021 at 13:02

MatBailie · Accepted Answer · 2021-12-09 14:57:08Z

2

You can use OUTER APPLY or CROSS APPLY

SELECT TABLE1.data1,
       TABLE1.data2,
       TABLE1.data3,
       t2.data4,
       t2.data5
FROM TABLE1
OUTER APPLY (SELECT TOP 1
               data4,
               data5
        FROM TABLE2 t2
        WHERE t2.data1 = TABLE1.data1
          AND t2.data4 IS NOT NULL
        ORDER BY t2.SomeColumn
-- TOP should have an ORDER BY otherwise results are not guaranteed
) t2;

edited Dec 9, 2021 at 14:57

MatBailie

87.5k19 gold badges112 silver badges144 bronze badges

answered Dec 9, 2021 at 14:49

Charlieface

78.7k8 gold badges35 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

marcothesane · Accepted Answer · 2021-12-09 13:09:35Z

I notice that in your table2, except for NULLs in data4, the rows do not differ. So a SELECT DISTINCT is easy to code, and, albeit resource intensive, as it is a GROUP BY *, in essence, good enough for this example. And should you have differences, the result tables will suddenly have duplicates that you did not expect, and that will guide you to further data investigations.

That said, here you go:

WITH                                                                                                                                                                                                
-- your input ..
tb1(data1,data2,data3) AS (
          SELECT 1,12,13
UNION ALL SELECT 2,22,23
UNION ALL SELECT 3,32,33
)
,
tb2(data1,data4,data5) AS (
          SELECT 1,NULL,015
UNION ALL SELECT 1,14,115
UNION ALL SELECT 1,14,115
UNION ALL SELECT 2,NULL,025
UNION ALL SELECT 2,24,125
UNION ALL SELECT 2,24,125
UNION ALL SELECT 3,NULL,035
UNION ALL SELECT 3,34,135
UNION ALL SELECT 3,34,135
)
-- end of your input.
-- Real Query starts here; replace following comma with "WITH" ..
,
tb2grp AS (
  SELECT DISTINCT
    *
  FROM tb2
  WHERE data4 IS NOT NULL
  -- chk  data1 | data4 | data5 
  -- chk -------+-------+-------
  -- chk      1 |    14 |   115
  -- chk      2 |    24 |   125
  -- chk      3 |    34 |   135
)
SELECT
  tb1.data1
, tb1.data2
, tb1.data3
, tb2.data4
, tb2.data5
FROM tb1 JOIN tb2grp AS tb2 USING(data1)
ORDER BY data1;
-- out  data1 | data2 | data3 | data4 | data5 
-- out -------+-------+-------+-------+-------
-- out      1 |    12 |    13 |    14 |   115
-- out      2 |    22 |    23 |    24 |   125
-- out      3 |    32 |    33 |    34 |   135

Collectives™ on Stack Overflow

How to put nested query in a JOIN?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related