0

I'm trying to do an iteration over a table which has multiple ClientID's. Suppose the following table:

Table1
#| ClientID | Begin_Date | End_date   | Product |
1| 1267     | 2018-02-01 | 2019-07-03 | A       |
2| 1267     | 2019-07-03 | 2020-11-14 | A       |
3| 1267     | 2020-03-01 | 2021-03-01 | B       |
4| 6543     | 2017-07-15 | 2018-07-15 | C       |
5| 6543     | 2018-07-15 | 2019-07-15 | C       |
6| 1599     | 2020-03-17 | 2021-03-17 | A       |
7| 1599     | 2020-05-17 | 2021-05-17 | D       |
*note that there is no unique value, except for the row-numbers.

I would like to perform a Python-like for-loop logic, something like:

For every unique ClienID (multiple rows) in Table1, check if the set of rows with the same ClientID have match on Begin_date and End_date (optional: on given argument), and if so, check if Product is the same as the matching date rows. Put that result in a new column, something like 'match' or 'no match'. In the result for Table1, I would like to see ClientID 1267 and 6543, because the Begin_date is the same as the End_date on the next row and also has the same Product on the next row.

I tried various stuff with CASE statements and FUNCTION creation (using WHILE loops) but I can't seem to figure it out.

create function Check_test (@enddate datetime)
returns nvarchar
as
begin
    declare @rownum int
    declare @custid nvarchar(20)
    declare @name nvarchar(20)
    declare @chck nvarchar(15)
    declare @datematch nvarchar(10)
    select @custid = max(ClientID) from Table1
    select @rownum = count(*) from Table1
    while @rownum > 0
            begin
                select @name = ClientID from Table1 where ClientID = @custid
                set @chck = case when month(@enddate) = 2 and year(@enddate) = 2019 then 'NEXT' else 'STOP' end
                set @datematch = case when @chck = 'NEXT' then 'MATCH' else 'NO MATCH' end
                -- unsure from this point on
                select top 1 @custid=ClientID from raw_import_202003 where ClientID < @custid order by ClientID desc
                set @rownum = @rownum - 1
            end
    return --some output variable
end

Of course, the problem is that it's not complete and also it's a scalar user defined function, which only produces one scalar value.

Any thoughts anyone?

** EDIT **

Thanks a lot for your answers so far! I've tried your solutions, but as some already mentioned, my questioning was not quite accurate, which is why I'm still stuck. I'm sorry, I'm new to posting on forums! Please let me try again :)

I consulted the business-case owner. First of all, forget all of the above stuff. The actual data I'm trying to analyze looks like this (values are in Dutch, suppliers are anonymized):

CLIENT_ID | CLIENT_ROW | REFERENCE  | STARTDATE  | ENDDATE    | PRODUCT              | SUPPLIER | VOLUME | CATEGORY
9.325     | 1          | 12027-1    | 2017-03-01 | 2018-03-01 | PGB Logeren          | N        | 30     | Jeugd GGZ Specialistisch
9.325     | 2          | 12027-2    | 2017-03-01 | 2018-03-01 | PGB Begeleiding      | N        | 96     | Jeugd GGZ Specialistisch
9.325     | 3          | 12057-1    | 2016-11-01 | 2016-12-01 | Basis GGZ Middel     | A        | 1      | Basis GGZ Midden
9.325     | 4          | 12058-1    | 2017-01-01 | 2017-03-01 | GGZ Basis Middel     | A        | 1      | Basis GGZ Midden
9.325     | 5          | 16536-1    | 2018-03-01 | 2019-02-01 | Kortdurend Verblijf  | B        | 2      | Jeugd met een beperking
9.325     | 6          | 16536-2    | 2018-03-01 | 2018-03-01 | Begeleiding Ambulant | C        | 120    | Jeugd met een beperking
9.325     | 7          | 16536-3    | 2018-03-01 | 2019-02-01 | Dagbesteding         | B        | 12     | Jeugd met een beperking
9.325     | 8          | 16536-4    | 2018-03-01 | 2019-02-01 | Kortdurend Verblijf  | B        | 6      | Jeugd met een beperking
9.325     | 9          | 18563-1    | 2018-09-01 | 2019-02-01 | Vervoer              | D        | 2      | Jeugd met een beperking
9.325     | 10         | 20201-1    | 2019-03-01 | 2020-02-01 | Kortdurend Verblijf  | B        | 33     | Jeugd met een beperking
9.325     | 11         | 20201-2    | 2019-03-01 | 2020-02-01 | Dagbesteding         | B        | 14     | Jeugd met een beperking
9.325     | 12         | 20201-3    | 2019-03-01 | 2020-02-01 | Vervoer              | D        | 24     | Jeugd met een beperking
9.325     | 13         | 23736-1    | 2020-01-01 | 2020-12-01 | Kinderplein          | E        | 5      | Kindergeneeskunde
9.325     | 14         | 6189-1     | 2015-01-01 | 2015-04-01 | Begeleiding          | F        | 1      | Jeugd met een beperking
9.325     | 15         | 6192-1     | 2015-01-01 | 2015-04-01 | Logeren              | F        | 1      | Jeugd met een beperking
9.325     | 16         | 6973-1     | 2013-01-01 | 2015-12-01 | Behandeling kort     | G        | 1      | Jeugd GGZ Specialistisch
9.325     | 17         | 8216-1     | 2015-04-01 | 2016-01-01 | Logeren              | F        | 4      | Jeugd met een beperking
9.325     | 18         | 9775-1     | 2016-01-01 | 2016-01-01 | Logeren              | F        | 2      | Jeugd met een beperking
6.693     | 1          | 11042-1    | 2016-07-01 | 2017-07-01 | PGB Dagactiviteit    | N        | 13     | Jeugd GGZ Specialistisch
6.693     | 2          | 11042-2    | 2016-07-01 | 2017-07-01 | PGB Logeren          | N        | 31     | Jeugd GGZ Specialistisch
6.693     | 3          | 11756-1    | 2017-01-01 | 2017-07-01 | Dagactiviteit        | H        | 10     | Jeugd met een beperking
6.693     | 4          | 12517-1    | 2017-03-01 | 2017-12-01 | PGB Begeleiding      | N        | 24     | Jeugd GGZ Specialistisch
6.693     | 5          | 13450-1    | 2017-02-01 | 2017-03-01 | GGZ Basis            | A        | 1      | Onvolledig Behandeltraject
6.693     | 6          | 13734-1    | 2017-07-01 | 2018-03-01 | Dagactiviteit        | H        | 8      | Jeugd met een beperking
6.693     | 7          | 13734-2    | 2017-07-01 | 2017-10-01 | PGB Dagactiviteit    | N        | 13     | Jeugd GGZ Specialistisch
6.693     | 8          | 13734-3    | 2017-07-01 | 2017-10-01 | PGB Logerenl         | N        | 8      | Jeugd GGZ Specialistisch
6.693     | 9          | 13734-4    | 2018-03-01 | 2018-07-01 | Dagbesteding         | H        | 3      | Jeugd met een beperking
6.693     | 10         | 17996-1    | 2018-07-01 | 2019-07-01 | Dagbesteding         | H        | 3      | Jeugd met een beperking
6.693     | 11         | 21459-1    | 2019-07-01 | 2020-07-01 | Dagbesteding         | H        | 3      | Jeugd met een beperking
6.693     | 12         | 21628-1    | 2019-09-01 | 2020-08-01 | Kortdurend Verblijf  | N        | 8      | Jeugd met een beperking
6.693     | 13         | 6142-1     | 2015-01-01 | 2015-02-01 | Dagactiviteit        | H        | 2      | Jeugd met een beperking
6.693     | 14         | 8865-1     | 2015-03-01 | 2015-06-01 | Basis GGZ Intensief  | I        | 1      | Jeugd GGZ Specialistisch
6.693     | 15         | 9138-1     | 2015-01-01 | 2016-12-01 | Dagactiviteit        | H        | 2      | Jeugd met een beperking

My question:

This sample holds two 'blocks' of client_id's. Within a block, I would like to check which rows have a startdate that is the same as any enddate within the same block. If there's a match, it should return both rows. More in depth: I'm trying to find clients with products that have been 'retained' by suppliers. Ideally, I would even like to see which products start in the next month. So if any enddate = any startdate + 1 month then return those rows. Is this achievable in SQL, and if so, how?

Note 1: I'm only interested in seeing matching months and years, which is why I set all days to 01. Note 2: column REFERENCE looks like a unique value, but that's only the case for this sample.

Desired result (for CLIENT_ID '9325'):

9.325     | 1          | 12027-1    | 2017-03-01 | 2018-03-01 | PGB Logeren          | N        | 30     | Jeugd GGZ Specialistisch
9.325     | 2          | 12027-2    | 2017-03-01 | 2018-03-01 | PGB Begeleiding      | N        | 96     | Jeugd GGZ Specialistisch
9.325     | 4          | 12058-1    | 2017-01-01 | 2017-03-01 | GGZ Basis Middel     | A        | 1      | Basis GGZ Midden
9.325     | 5          | 16536-1    | 2018-03-01 | 2019-02-01 | Kortdurend Verblijf  | B        | 2      | Jeugd met een beperking
9.325     | 6          | 16536-2    | 2018-03-01 | 2018-03-01 | Begeleiding Ambulant | C        | 120    | Jeugd met een beperking
9.325     | 7          | 16536-3    | 2018-03-01 | 2019-02-01 | Dagbesteding         | B        | 12     | Jeugd met een beperking
9.325     | 8          | 16536-4    | 2018-03-01 | 2019-02-01 | Kortdurend Verblijf  | B        | 6      | Jeugd met een beperking
9.325     | 17         | 8216-1     | 2015-04-01 | 2016-01-01 | Logeren              | F        | 4      | Jeugd met een beperking
9.325     | 18         | 9775-1     | 2016-01-01 | 2016-01-01 | Logeren              | F        | 2      | Jeugd met een beperking

The result for CLIENT_ID '6693' should be similar.

All help is really appreciated. Thanks a lot so far!!

6
  • (1) Explain the logic in English, not Pythonese. (2) Show the desired results. Commented Mar 17, 2020 at 21:03
  • (1) Pardon my French, what is unlcear in this section? (2) Desired result would be an added column to the initial Table1, named matching_date_product and boolean values like TRUE and FALSE Commented Mar 17, 2020 at 21:07
  • Using SQL requires a shift of mindset which is usually quite challenging for programmers. It is a different kind of logic. please supply the required result (similar to what you supply as a data sample) Commented Mar 17, 2020 at 21:44
  • Have you looked into using a cursor to get your data and with that loop through the records collected by the cursor? Commented Mar 17, 2020 at 23:45
  • @DavidדודוMarkovitz Thanks for answering. I edited the question with desired result sample. Commented Mar 23, 2020 at 9:22

3 Answers 3

2

Your question is hard to understand. But it seems to me like you want to use the lead() window function to get the "next" date and product.

SELECT DISTINCT
       clientid
       FROM (SELECT clientid,
                    CASE
                      WHEN lead(begin_date) OVER (PARTITION BY clientid
                                                  ORDER BY #) = end_date
                           AND lead(product) OVER (PARTITION BY clientid
                                                   ORDER BY #) = product THEN
                        1
                    END c
                    FROM table1) x
       WHERE c = 1;

db<>fiddle

Sign up to request clarification or add additional context in comments.

1 Comment

thanks for your answer, it really helped me get on track. But still no desired result. I edited my question with the desired result.
0

Following query should solve your problem:

SELECT fo.id, fo.ClientID, fo.Begin_Date, fo.End_Date, fo.Product, 'Match'
FROM clients AS fo
INNER JOIN clients AS fs ON fo.End_Date = fs.Begin_Date AND fo.Product = fs.Product AND fo.ClientID = fs.ClientID

UNION ALL

SELECT fo.id, fo.ClientID, fo.Begin_Date, fo.End_Date, fo.Product, 'No-Match'
FROM clients AS fo
LEFT JOIN clients AS fs ON fo.End_Date = fs.Begin_Date AND fo.Product = fs.Product AND fo.ClientID = fs.ClientID
WHERE fs.id IS NULL

If the dataset is big, I strongly suggest you to add an index to your table.

4 Comments

"If the dataset is big, I strongly suggest you to add an index to your table". Nope.
@DavidדודוMarkovitz Why not?
Because you are running on the whole dataset and SQL Server supports Hash Join. Indexes have cost penalty for DML and are challenging for maintenance, especially for big tables. Actually, the worst performance you get for analytical queries is caused by unnecessary use of indexes.
@osumatu: thanks for answering. This did not get my desired result quite yet. But is edited my question with new information.
0

Thanks to all for your support! I managed to get the desired result using this query:

select 
t1.client_id        as client_id1   
,t1.client_row      as client_row1 
,t1.reference       as reference1 
,t1.startdate       as startdate1  
,t1.enddate         as enddate1    
,t1.product         as product1    
,t1.supplier        as supplier1   
,t1.volume          as volume1     
,t1.category        as category1   
,t2.client_id       as client_id2  
,t2.client_row      as client_row2 
,t2.reference       as reference2  
,t2.startdate       as startdate2  
,t2.enddate         as enddate2    
,t2.product         as product2    
,t2.supplier        as supplier2   
,t2.volume          as volume2     
,t2.category        as category2   
from table1 t1
left join table1 t2 on 
        t1.client_id                                        =   t2.client_id
    and DATEADD(MONTH, DATEDIFF(MONTH, 0, t1.enddate), 0)   =   DATEADD(MONTH, DATEDIFF(MONTH, 0, t2.startdate), 0)
    and DATEADD(YEAR, DATEDIFF(YEAR, 0, t1.enddate), 0)     =   DATEADD(YEAR, DATEDIFF(YEAR, 0, t2.startdate), 0)
    and t1.client_row                                       <>  t2.client_row
    and t1.reference                                        <>  t2.reference
    and t1.product                                          =   t2.product
    and t1.supplier                                         =   t2.supplier
    and t1.volume                                           =   t2.volume
where t1.category = 'Jeugd met een beperking' and t2.category = 'Jeugd met een beperking'

I created the client_row column using @sticky-bit's advice and created the query using @osumatu's advice!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.