Postgres SQL - How do I aggregated groups of multiple consecutive rows?

Question

I have a data representing tagged continuous-spans in a single table with a <tag, start & stop>.

Example below.

I'm trying to combine multiple rows into a single row where the condition is that they create a "continuous span". In the query below - I would like the functionality that LEFT_MOST_CONTINUOUS returns the minimum v_start of a continuous span (same for RIGHT_MOST_CONTINUOUS for maximum v_stop). Note that there might be more than a single continuous span (that should have different v_start and v_stop values).

Input:

WITH data AS (
    SELECT *
    FROM (VALUES 
        ('a', 2, 3),
        ('a', 3, 5),
        ('a', 5, 7),
        ('a', 8, 10),
        ('a', 10, 12),
        ('a', 12, 14),
        ('b', 7, 8),
        ('b', 8, 10),
        ('b', 12, 15),
        ('c', 10, 11)
    ) AS T(tag, v_start, v_stop)
    ORDER BY tag, v_start, v_stop
)
SELECT tag,
       LEFT_MOST_CONTINUOUS(v_start) OVER (PARTITION BY tag),
       RIGHT_MOST_CONTINUOUS(v_stop) OVER (PARTITION BY tag)
FROM data
ORDER BY 1, 2, 3

Where I expect to get the following output:

"a" 2   7
"a" 8   14
"b" 7   10
"b" 12  15
"c" 10  11

Since I want to merge the first 3 tuples (for tag "a") which are consecutive into a single value representing the entire span; same for the next 3 tuples (again for "a"). Then for "b" we can merge the next 2, but leave out the 3rd (which has it's v_start != the other's v_stop). And "c" there is nothing to merge with.

Help appreciated,

Tal

Mike Organek · Accepted Answer · 2022-01-13 20:42:27Z

2

You can use a gaps-and-islands approach by marking the first record of each group when either there is no previous record for the tag or the v_start is greater than v_stop of the previous record:

select tag, v_start, v_stop, 
         coalesce(lag(v_stop) over w < v_start, true) as is_end_grp
    from data
  window w as (partition by tag order by v_start)

Use a windowed sum() of the boolean is_end_grp cast to int (1 if true, 0 if false) to number the groups:

  select tag, sum(is_end_grp::int) over (partition by tag 
                                             order by v_start) as grp_num,
         v_start, v_stop
    from mark_gaps

Aggregation over (tag, grp_num) will produce your desired result:

select tag, min(v_start) as v_start, max(v_stop) as v_stop
  from numbered_groups
 group by tag, grp_num
 order by tag, v_start

Working DB<>Fiddle

answered Jan 13, 2022 at 20:42

Mike Organek

12.6k3 gold badges15 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

EoinS Over a year ago

Summing over is_end_grp had me stumped, great solution

Saiag Over a year ago

Thanks! really elegant & explained solution - love the use of grp_num

Saiag Over a year ago

excellent documentation link for the sum() over () window function, very useful for understanding the above answer.

EoinS · Accepted Answer · 2022-01-13 21:13:33Z

Using the numbered_groups logic from @Mike Organek answer. I just started from a different place

WITH data AS (
    SELECT *
    , case when lead(v_start) over(partition by tag order by v_start) = v_stop then 0 else 1 end stopcheck
    , case when lag(v_stop) over(partition by tag order by v_stop) = v_start then 0 else 1 end startcheck
    FROM (VALUES 
        ('a' , 2 , 3),
        ('a', 3, 5),
        ('a', 5, 7),
        ('a', 8, 10),
        ('a', 10, 12),
        ('a', 12, 14),
        ('b', 7, 8),
        ('b', 8, 10),
        ('b', 12, 15),
        ('c', 10, 11)
    ) AS T(tag, v_start, v_stop)
    ORDER BY tag, v_start, v_stop
)
,cnt as (
  select *
  , sum(startcheck) over (partition by tag order by v_start) grpn 
  from data)
select c1.tag, c1.v_start, c2.v_stop
from cnt c1 
inner join cnt c2 
  on c1.tag = c2.tag and c1.grpn = c2.grpn
  where c1.startcheck = 1 and c2.stopcheck = 1

This logic is all based on the assumption that your data always starts where the last row left off, there is no overlap etc.

Create a startcheck and stopcheck by comparing the prior row and next row relatively. From here use another window function sum() over to order the start records (so we don't match start of second batch to stop of first batch)

Join the table to itself matching like tag and groups. Filtering start and stop records

persian-theme · Accepted Answer · 2022-01-13 21:57:18Z

You can use following query

WITH data AS (
    SELECT *
    FROM (VALUES 
        ('a', 2, 3),
        ('a', 3, 5),
        ('a', 5, 7),
        ('a', 8, 10),
        ('a', 10, 12),
        ('a', 12, 14),
        ('b', 7, 8),
        ('b', 8, 10),
        ('b', 12, 15),
        ('c', 10, 11)
    ) AS T(tag, v_start, v_stop)
    ORDER BY tag, v_start, v_stop
),
cte1 as(
   select *,
      case
        when lag(v_stop)over(partition by tag order by(select null)) = v_start
          then 0
          else 1
      end as grp
  from data
),
cte2 as(
  select *, 
         sum(grp) over (partition by tag  order by v_start) as rnk
  from cte1
)

select tag,min(v_start)v_start,max(v_stop)v_stop
from cte2
group by tag,rnk
order by tag

Demo in db<>fiddle

Collectives™ on Stack Overflow

Postgres SQL - How do I aggregated groups of multiple consecutive rows?

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related