0

We have a couple million rows of data that we need to "explode" out by adding a row for every date between the started_at date and the ended_at date. The while loop is what is taking the longest in our query.

Any idea on how to optimize or replace it?

IF (OBJECT_ID('TempDb..#exploded_services') IS NOT NULL)
  DROP TABLE #exploded_services;

CREATE TABLE #exploded_services
  (
   target_date date,
   move_id varchar(30),
   initiation_id varchar(30),
   initiated_at date,
   booked_at date,
   transferee varchar(60),
   account_id varchar(30),
   mc_id varchar(30),
   po varchar(60),
   weight int,
   service varchar(150),
   started_at date,
   ended_at date,
   location_id nvarchar(64),
   description varchar(max),
   provider varchar(max),
   mode varchar(60),
   origin_location_id nvarchar(64),
   destination_location_id nvarchar(64),
   transferee_phone varchar(40),
   transferee_email varchar(100),
   status varchar(10),
   ordinal int
  );


WHILE (@pointer <= @end_date)
 BEGIN
   INSERT INTO #exploded_services
   SELECT
     @pointer,
     svcs.*
   FROM #Services svcs
   WHERE @pointer BETWEEN svcs.started_at AND COALESCE(svcs.ended_at,@end_date)
   SET @pointer = DATEADD(dd, 1, @pointer)
 END;
5
  • Add a RowNumber in the select statement and user DATEADD(dd, Row_Number Column Value, @pointer) in the where clause. A single select statement can be inserted all rows. Commented Apr 15, 2019 at 20:57
  • 2
    Just do this with a single insert statement. What is the point of the loop here? Commented Apr 15, 2019 at 21:00
  • Please read up on the difference between declarative and imperative language structures. Therin lies your answer. NEVER use loops inside SQL declarative statements. Commented Apr 15, 2019 at 21:05
  • Also, please don't use shorthand like dd. Not much more effort to type day, but it sure is more readable (never mind reliable). Commented Apr 15, 2019 at 21:07
  • You are creating days for date ranges. In a programming language this is done with a loop. In SQL you would typically use a recursive query for this. I don't have the time now to post an answer. Hopefully, someone else will. Commented Apr 15, 2019 at 21:16

3 Answers 3

1
  1. Create a table with one date column.
  2. Populate it will all possible dates that applies to your services.
  3. Populate your target table with:
 INSERT INTO #exploded_services
   SELECT
     dates_table.date,
     svcs.*
   FROM #Services svcs
   INNER JOIN dates_table ON dates_table.date BETWEEN svcs.started_at AND COALESCE(svcs.ended_at,_arbitrary_end_date_)
Sign up to request clarification or add additional context in comments.

Comments

0

This can be achieved using a Tally table. Here's an example on how to do it using one created on the fly with cascading ctes.

WITH 
E(n) AS(
    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
),
E2(n) AS(
    SELECT a.n FROM E a, E b
),
E4(n) AS(
    SELECT a.n FROM E2 a, E2 b
),
cteTally(n) AS(
    SELECT TOP(DATEDIFF(DD, @pointer, @end_date) + 1) 
            ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 n
    FROM E4
)
INSERT INTO #exploded_services
SELECT
    DATEADD( dd, n @pointer),
    svcs.*
FROM #Services svcs
JOIN cteTally t ON DATEADD( dd, n @pointer) BETWEEN svcs.started_at AND COALESCE(svcs.ended_at,@end_date);

Comments

0

You could try below code using CTE to generate all dates needed:

 -- cte to get all dates needed
 ;with cte as (
    select @pointer ptr
    union all
    select DATEADD(dd, 1, @pointer) from cte
    where @pointer < @end_date
 )
 -- adjusted insert query
 INSERT INTO #exploded_services
 select c.*, s.*
 from #Services s
 join cte c on c.ptr between s.started_at and coalesce(svcs.ended_at,@end_date)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.