0

Let's say that I have the following two tables:

TABLE1

+-------+-------+-------+
| data1 | data2 | data3 |
+-------+-------+-------+
|     1 |    12 |    13 |
|     2 |    22 |    23 |
|     3 |    32 |    33 |
+-------+-------+-------+

TABLE2

+-------+-------+-------+
| data1 | data4 | data5 |
+-------+-------+-------+
|     1 |  NULL |   015 |
|     1 |    14 |   115 |
|     1 |    14 |   115 |
|     2 |  NULL |   025 |
|     2 |    24 |   125 |
|     2 |    24 |   125 |
|     3 |  NULL |   035 |
|     3 |    34 |   135 |
|     3 |    34 |   135 |
+-------+-------+-------+

And I have the following query:

SELECT TABLE1.data1,
       TABLE1.data2,
       TABLE1.data3,
       (SELECT TOP 1
               data4
        FROM TABLE2
        WHERE data1 = TABLE1.data1
          AND data4 IS NOT NULL),
       (SELECT TOP 1
               data5
        FROM TABLE2
        WHERE data1 = TABLE1.data1
          AND data4 IS NOT NULL)
FROM TABLE1;

QUERY RESULT

+-------+-------+-------+-------+-------+
| data1 | data2 | data3 | data4 | data5 |
+-------+-------+-------+-------+-------+
|     1 |    12 |    13 |    14 |   115 |
|     2 |    22 |    23 |    24 |   125 |
|     3 |    32 |    33 |    34 |   135 |
+-------+-------+-------+-------+-------+

Assuming the TABLE2 meets these two conditions:

  1. Foreach data1, data4 can either be 1 or have the same value in every row.
  2. Foreach data1, data5 will have one value for each row with data4 null and another for each row with data4 not null.

Is there a way to rewrite the query in such a way that I don't have a nested query in the select part? Maybe using JOIN statements? I'm asking because I've realized that the performance of the nested query in the SELECT is quite poor. However, if I try with a JOIN I end up duplicating the rows that have data4 different than null.

5
  • Whitespace and Linebreaks are paramount to making readable text; not just in code. Please get into the habit of making good use of both. Poor/bad formatting is not going to help you or others when you need to be able to quickly read and understand your code. It helps easily distinguish specific code blocks, and sections, and also makes finding errors far easier when a line only contains 10's of characters, rather than 100's. Commented Dec 9, 2021 at 12:49
  • 2
    Note that a TOP without an ORDER BY is a sure sign of a flaw. This means that the data engine is free to return what ever arbitrary value it wants, and that value could be different every time you run said query. If you are using TOP you need to ensure the query has an ORDER BY so that you get consistent and reliable results. Commented Dec 9, 2021 at 12:52
  • Do one JOIN instead of the two subqueries. Commented Dec 9, 2021 at 12:56
  • You can either join with row_number or to do a lateral query. Commented Dec 9, 2021 at 13:01
  • As @Larnu says, TOP without ORDER BY rarely makes sense. Commented Dec 9, 2021 at 13:02

2 Answers 2

2

You can use OUTER APPLY or CROSS APPLY

SELECT TABLE1.data1,
       TABLE1.data2,
       TABLE1.data3,
       t2.data4,
       t2.data5
FROM TABLE1
OUTER APPLY (SELECT TOP 1
               data4,
               data5
        FROM TABLE2 t2
        WHERE t2.data1 = TABLE1.data1
          AND t2.data4 IS NOT NULL
        ORDER BY t2.SomeColumn
-- TOP should have an ORDER BY otherwise results are not guaranteed
) t2;
Sign up to request clarification or add additional context in comments.

Comments

0

I notice that in your table2, except for NULLs in data4, the rows do not differ. So a SELECT DISTINCT is easy to code, and, albeit resource intensive, as it is a GROUP BY *, in essence, good enough for this example. And should you have differences, the result tables will suddenly have duplicates that you did not expect, and that will guide you to further data investigations.

That said, here you go:

WITH                                                                                                                                                                                                
-- your input ..
tb1(data1,data2,data3) AS (
          SELECT 1,12,13
UNION ALL SELECT 2,22,23
UNION ALL SELECT 3,32,33
)
,
tb2(data1,data4,data5) AS (
          SELECT 1,NULL,015
UNION ALL SELECT 1,14,115
UNION ALL SELECT 1,14,115
UNION ALL SELECT 2,NULL,025
UNION ALL SELECT 2,24,125
UNION ALL SELECT 2,24,125
UNION ALL SELECT 3,NULL,035
UNION ALL SELECT 3,34,135
UNION ALL SELECT 3,34,135
)
-- end of your input.
-- Real Query starts here; replace following comma with "WITH" ..
,
tb2grp AS (
  SELECT DISTINCT
    *
  FROM tb2
  WHERE data4 IS NOT NULL
  -- chk  data1 | data4 | data5 
  -- chk -------+-------+-------
  -- chk      1 |    14 |   115
  -- chk      2 |    24 |   125
  -- chk      3 |    34 |   135
)
SELECT
  tb1.data1
, tb1.data2
, tb1.data3
, tb2.data4
, tb2.data5
FROM tb1 JOIN tb2grp AS tb2 USING(data1)
ORDER BY data1;
-- out  data1 | data2 | data3 | data4 | data5 
-- out -------+-------+-------+-------+-------
-- out      1 |    12 |    13 |    14 |   115
-- out      2 |    22 |    23 |    24 |   125
-- out      3 |    32 |    33 |    34 |   135

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.