Grouping consecutive dates in PostgreSQL

Question

I have two tables which I need to combine as sometimes some dates are found in table A and not in table B and vice versa. My desired result is that for those overlaps on consecutive days be combined.

I'm using PostgreSQL.

Table A

id  startdate   enddate
--------------------------
101 12/28/2013  12/31/2013

Table B

id  startdate   enddate
--------------------------
101 12/15/2013  12/15/2013
101 12/16/2013  12/16/2013
101 12/28/2013  12/28/2013
101 12/29/2013  12/31/2013

Desired Result

id  startdate   enddate
-------------------------
101 12/15/2013  12/16/2013
101 12/28/2013  12/31/2013

Your intention is not clear. What happened to id=102 from table_b ? — wildplasser
– wildplasser, Commented Sep 16, 2015 at 9:11
@wildplasser it's part of the sequence running from 12/28 to 12/31... — mlinth
– mlinth, Commented Sep 16, 2015 at 9:14
So, when comparing the records, you use the dates (or date ranges), but you ignore the ids. Then where does the id in the output stem from? — wildplasser
– wildplasser, Commented Sep 16, 2015 at 9:18
@wildplasser edited the typo in the id of the result. my mistake sorry. — mtryingtocode
– mtryingtocode, Commented Sep 16, 2015 at 9:25

mlinth · Accepted Answer · 2015-09-16 18:02:23Z

2

Right. I have a query that I think works. It certainly works on the sample records you provided. It uses a recursive CTE.

First, you need to merge the two tables. Next, use a recursive CTE to get the sequences of overlapping dates. Finally, get the start and end dates, and join back to the "merged" table to get the id.

with recursive allrecords as -- this merges the input tables. Add a unique row identifier
(
    select *, row_number() over (ORDER BY startdate) as rowid from
    (select * from table1
    UNION
    select * from table2) a
),
 path as ( -- the recursive CTE. This gets the sequences
    select rowid as parent,rowid,startdate,enddate from allrecords a
    union 
    select p.parent,b.rowid,b.startdate,b.enddate from  allrecords b  join path p on (p.enddate + interval '1 day')>=b.startdate and p.startdate <= b.startdate
)


SELECT id,g.startdate,g.enddate FROM -- outer query to get the id

    -- inner query to get the start and end of each sequence
    (select parent,min(startdate) as startdate, max(enddate) as enddate from
    (
        select *, row_number() OVER (partition by rowid order by parent,startdate) as row_number from path
    ) a
    where row_number = 1 -- We only want the first occurrence of each record
    group by parent)g
INNER JOIN allrecords a on a.rowid = parent

edited Sep 16, 2015 at 18:02

answered Sep 16, 2015 at 17:55

mlinth

3,1586 gold badges35 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mtryingtocode Over a year ago

I did some adjustments to still include the id in the result and works fine currently. Will do some more tests on the combination of dates. Couldn't thank you enough for this one!

wildplasser · Accepted Answer · 2015-09-16 20:05:42Z

The below fragment does what you intend. (but it will probably be very slow) The problem is that detecteng (non)overlapping dateranges is impossible with standard range operators, since a range could be split into two parts. So, my code does the following:

split the dateranges from table_A into atomic records, with one date per record
[the same for table_b]
cross join these two tables (we are only interested in A_not_in_B, and B_not_in_A) , remembering which of the L/R outer join wings it came from.
re-aggregate the resulting records into date ranges.

-- EXPLAIN ANALYZE
-- 
WITH  RECURSIVE ranges AS (
            -- Chop up the a-table into atomic date units
    WITH ar AS (
            SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date  AS thedate
            ,  'A'::text AS which
            , a.id
            FROM a
            )
            -- Same for the b-table
    , br AS (
            SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date  AS thedate
            ,  'B'::text AS which
            , b.id
            FROM b
            )
            -- combine the two sets, retaining a_not_in_b plus b_not_in_a
    , moments AS (
            SELECT COALESCE(ar.id,br.id) AS id
            , COALESCE(ar.which, br.which) AS which
            , COALESCE(ar.thedate, br.thedate) AS thedate
            FROM ar
            FULL JOIN br ON br.id = ar.id AND br.thedate =  ar.thedate
            WHERE ar.id IS NULL OR br.id IS NULL
            )
            -- use a recursive CTE to re-aggregate the atomic moments into ranges
    SELECT m0.id, m0.which
            , m0.thedate AS startdate
            , m0.thedate AS enddate
    FROM moments m0
    WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id  AND nx.which = m0.which
            AND nx.thedate = m0.thedate -1
            )
    UNION ALL
    SELECT rr.id, rr.which
            , rr.startdate AS startdate
            , m1.thedate AS enddate
    FROM ranges rr
    JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
    )
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
    -- suppress partial subassemblies
    WHERE nx.id = ra.id AND nx.which = ra.which
    AND nx.startdate = ra.startdate
    AND nx.enddate > ra.enddate
    )
 ;

Collectives™ on Stack Overflow

Grouping consecutive dates in PostgreSQL

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related