0

I'm guessing my question was asked before, but after hours of scouring, either I'm not able to figure out how the answers match my dilemma or it truly hasn't been asked (I'm guessing the former). So, I'm going to ask again, in my own way, so that I may better understand the answer.

The goal is to grab data for a project based on criteria that may be provided to me. There are three different types of criteria: C, M, and P. I usually receive two of the three different types of criteria for a project, and it's not always the same two. I need to be able to restrict a particular type of criteria only if they provide to me any of that type of criteria.

Here's the general layout of my query that I'm starting with (assuming project_id = 1 for this example):

WITH
    cte_c AS (SELECT c FROM prj_rqst_c_criteria WHERE project_id = 1)
  , cte_m AS (SELECT m FROM prj_rqst_m_criteria WHERE project_id = 1)
  , cte_p AS (SELECT p FROM prj_rqst_p_criteria WHERE project_id = 1)
SELECT * FROM data_table
INNER JOIN cte_c ON data_table.c = cte_c.c
INNER JOIN cte_m ON data_table.m = cte_m.m
INNER JOIN cte_p ON data_table.p = cte_p.p
;

Assuming I have rows returned in cte_c and cte_p but no rows returned in cte_m, this would obviously yield 0 results for the entire query. What I want in that scenario is for cte_m to effectively be ignored while applying the JOINs for cte_c and cte_p. How do I modify the JOINs so that if any cte returns no rows, that particular cte will be ignored?

Thanks!

Edit: Some additional items: data_table has well over 1 million rows, so the goal is to only return rows that match the criteria that was provided for the project.

Some example data:

data_table
id | c | m   | p
------------------
1  | A | 101 | 999
2  | B | 102 | 998
3  | A | 103 | 998
4  | A | 102 | 999
5  | B | 101 | 998

If I'm asked to grab a project where c = 'A' and p = '999' but I'm not given any criteria for m, then I'd want the following rwos returned:

id | c | m   | p
------------------
1  | A | 101 | 999
4  | A | 102 | 999

If I'm asked to pull a project where m = 102 and c = 'A', then I just want the following returned:

id | c | m   | p
------------------
4  | A | 102 | 999

I hope this helps to visualize. Thanks again!

12
  • 2
    use Outer Joins instead of inner. perhaps full outer or Left join that way if there is no data in M, you'll still get rows from C & P? Also your join's need to be on cte_c, cte_m, not c,m,p. Sample data and expected results would be useful to ensure we fully understand the question. Commented May 13 at 16:02
  • Wouldn't that still return all the rows in data_table, though? I don't want all of them returned. We're talking millions of rows in that data_table! I only want the ones that match the supplied criteria for the project. The sample data is definitely a good thought. I'm not in an industry where I can share real data, so give me a few minutes to fake some for you. :) Commented May 13 at 16:05
  • 1
    An obvious way might be to add a corresponding OR (SELECT COUNT(*) FROM CTE_x) = 0 to each of the inner join ON conditions. Commented May 13 at 17:24
  • 1
    Do the CTEs always return a maximum of one row? If so, why three CTEs instead of one CTE with three columns? Why a CTE at all? This feels like an xyproblem in need of more context. Without context you may get the least bad solution to a poor design, rather than the best design for your real underlying needs. Commented May 13 at 18:24
  • 3
    It would still be better to have proper sample data. Can you explain the logic clearly? Is it that if a CTE returns rows then you want to filter by those rows, but if there are no rows then you want to ignore it completely? Commented May 13 at 20:54

3 Answers 3

2

In the absence of any further context, this can be solved using EXISTS and NOT EXISTS.

WITH 
  param_c AS (SELECT * FROM prj_rqst_c_criteria WHERE project_id = 1),
  param_m AS (SELECT * FROM prj_rqst_m_criteria WHERE project_id = 1),
  param_p AS (SELECT * FROM prj_rqst_p_criteria WHERE project_id = 1)
SELECT
  *
FROM
  data_table    AS d
WHERE
  (
    NOT EXISTS (SELECT * FROM param_c)
    OR  EXISTS (SELECT * FROM param_c WHERE c = d.c)
  )
  AND
  (
    NOT EXISTS (SELECT * FROM param_m)
    OR  EXISTS (SELECT * FROM param_m WHERE m = d.m)
  )
  AND
  (
    NOT EXISTS (SELECT * FROM param_p)
    OR  EXISTS (SELECT * FROM param_p WHERE p = d.p)
  )

It's a little long winded, but that's not the biggest problem.

The biggest problem is the repeated use of OR to cater for cases where there are no criteria for a given project.

Ideally you'd want to tell the DBMS just to ignore that whole block, but SQL doesn't work that way. In SQL, a statement is compiled to a single execution plan, able to cater for all scenarios. Essentially a pessimistic least-worst plan.

That could be avoided with dynamic-SQL (Using SQL to write SQL, with a custom WHERE clause).

Fiddle (adapted from ValNik's) : https://dbfiddle.uk/tWuJJlY3

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for this solution! It's not exactly as eloquent as I was hoping for, but it does work. Thank you!
Teradata supports "Incremental Planning and Execution", i.e. plan a few steps (a "Fragment"), get some feedback after their execution and then plan the next fragment. This will not be done for small data sets, but I would expect it for the actual table. Can be checked by the first sentence in Explain: This request is eligible for incremental planning and execution (IPE). The following is the static plan for the request. The actual/dynamic plan can be found in the QueryLog or forced by a Dynamic Explain (F7)
1

Consider join condition like column_value=criteria_value or criteria_value is null.

WITH cte_cmp AS (
SELECT c,m,p 
  FROM (select 5 as project_id) d
  left  join prj_rqst_c_criteria c on c.project_id=d.project_id
  left  join prj_rqst_m_criteria m on m.project_id=d.project_id
  left  join prj_rqst_p_criteria p on p.project_id=d.project_id
  WHERE coalesce(c.project_id,m.project_id,p.project_id) is not null
)
SELECT * 
FROM data_table t
INNER JOIN cte_cmp cmp ON 
       (t.c = cmp.c or cmp.c is null)
   and (t.m = cmp.m or cmp.m is null)
   and (t.p = cmp.p or cmp.p is null)
;

condition WHERE coalesce(c.project_id,m.project_id,p.project_id) is not null - we want that any of joins will satisfied or return empty result.

See example with test data:

data_table:

id c m p
1 A 101 999
2 B 102 998
3 A 103 998
4 A 102 999
5 B 101 998
project_id c
1 A
2 B
3 B
project_id m
2 102
3 101
project_id p
1 999
3 998

WHERE c.project_id = 1 c='A',m=null, p='999'

id c m p c m p
1 A 101 999 A null 999
4 A 102 999 A null 999

WHERE c.project_id = 2 c='B', m='102', p=null

id c m p c m p
2 B 102 998 B 102 null

WHERE c.project_id = 3 c='B',m='101', p='998'

id c m p c m p
5 B 101 998 B 101 998

Example with c=null,m='102',p=null and c=null,m=null,p=null
see in fiddle

fiddle

6 Comments

Thank you for this answer. Unfortunately, it doesn't seem to be working for me. The joins on the tables with 0 results are still causing the entire query to return 0 results. I did join them separately, given that the three sets of criteria don't have anything to do with each other: JOIN cte_c ON (data_table.c = cte_c.c OR cte_c.c IS NULL) JOIN cte_m ON (data_table.c = cte_m.m OR cte_m.m IS NULL) JOIN cte_p ON (data_table.c = cte_p.p OR cte_p.p IS NULL) Might that cause an issue with your proposed solution?
Jast changed query. CTE returns rows if any of criteria is true. And retuns empty result if no criteria satisfied. Pls, try again. I think about your question.
I really want this solution to work, and it makes sense to me that it should work. Unfortunately, I'm still getting 0 results when any of the criteria is null.
Just remove the WHERE COALESCE(...) IS NOT NULL?
|
0

If we assume each cte results in a single row with a single value, then we can use coalesce and refence the data_table's respective value. If however c,m,p could have multiple rows; this will not work.

WITH  
    cte_c AS (SELECT c FROM prj_rqst_c_criteria WHERE project_id = 1)  
  , cte_m AS (SELECT m FROM prj_rqst_m_criteria WHERE project_id = 1)  
  , cte_p AS (SELECT p FROM prj_rqst_p_criteria WHERE project_id = 1)  
SELECT * FROM data_table  
INNER JOIN cte_c ON data_table.c = coalesce(cte_c.c,data_table.c)  
INNER JOIN cte_m ON data_table.m = coalesce(cte_m.m,data_table.m)  
INNER JOIN cte_p ON data_table.p = coalesce(cte_p.p,data_table.p)  

essentially the value from the CTE is null; then use the data_table's value to join (essentially making the join always evaluate to TRUE ; and since cte_? is always 1 record we don't get record fanning/explosion.

Similar answer: Teradata 13: CASE statement in Join

essentailly we only join when there is data; otherwise the join is with the table value to itself (1=1) so it will always evaluate to true. And since each CTE is only 1 record, we will not encounter record fanning

1 Comment

Thanks for the feedback! Unfortunately, the ctes can and very often do yield more than just 1 result each. I only showed one in the example to keep the example simple. I will keep this here, though, in case I have other scenarios with max of 1 so I can try it at that point. Thanks!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.