3

Let say I have table like this:

+-----+-----------------+-------------+
| ID  |     Points      | BreakPoints |
+-----+-----------------+-------------+
| 123 | {6,8,1,3,7,9}   | {1,7}       |
| 456 | {16,9,78,96,33} | {78}        |
+-----+-----------------+-------------+

I want to "break" these Points sequences on points contained in BreakPoints, while keeping ID of original row. Order of elements in sequence is important, so I cannot sort them!

Also notice, that break points are in both result rows that came from breaking original sequence at that break point (on the end and start respectively). So result should be something like this:


+-----+------------+
| ID  |   Points   |
+-----+------------+
| 123 | {6,8,1}    |
| 123 | {1,3,7}    |
| 123 | {7,9}      |
| 456 | {16,9,78}  |
| 456 | {78,96,33} |
+-----+------------+

Of course, I can write PL/pgSQL function, call it for every row, iterate the array and RETURN NEXT for every sub-sequence. But is there any other way, without calling function for all rows?

2
  • 1
    Why is "8" in the first set? Commented Jul 16, 2020 at 12:24
  • @GordonLinoff because it is about position in the array, not element "value". (elements are in fact IDs themselves, their "value" is irrelevant). Commented Jul 16, 2020 at 13:00

2 Answers 2

2
WITH data(id, points, breakpoints) AS (
    VALUES (123, ARRAY [6,8,1,3,7,9], ARRAY [7, 1])
         , (456, ARRAY [16,9,78,96,33], ARRAY [78])
),
-- we'll map the breakpoints to the indices where they appear in `points` and sort this array
-- so, ARRAY[1, 7] -> ARRAY[3, 5] (the positions of 1 & 7 in `points`, arrays are 1-based)
-- and ARRAY[7, 1] -> ARRAY[3, 5] (since we sort this new 'breakpoint_indices' array)
sorted_breakpoint_indices(id, points, breakpoint_indices, number_of_breakpoints) AS (
    SELECT id
         , points
         , breakpoint_indices
         , number_of_breakpoints
    FROM data
    JOIN LATERAL (
        SELECT ARRAY_AGG(array_position(points, breakpoint) ORDER BY array_position(points, breakpoint))
             , COUNT(*) -- simply here to avoid multiple `cardinality(breakpoint_indices)` below
        FROM unnest(breakpoints) AS breakpoint
    ) AS f(breakpoint_indices, number_of_breakpoints)
    ON true
)
SELECT id
     , CASE i
         -- first segment, from start to breakpoint #1
         WHEN 0 THEN points[:breakpoint_indices[1]]
         -- last segment, from last breakpoint to end
         WHEN number_of_breakpoints THEN points[breakpoint_indices[number_of_breakpoints]:]
         -- default case, bp i to i+1
         ELSE points[breakpoint_indices[i]:breakpoint_indices[i+1]]
       END
FROM sorted_breakpoint_indices
   , generate_series(0, number_of_breakpoints, 1) AS f(i)

returns

+---+----------+
|id |result    |
+---+----------+
|123|{6,8,1}   |
|123|{1,3,7}   |
|123|{7,9}     |
|456|{16,9,78} |
|456|{78,96,33}|
+---+----------+

Note: I wrote multiple other versions of this while writing this answer, these can be seen by looking at this post's edit history

Sign up to request clarification or add additional context in comments.

6 Comments

Very interesting approach :) One more thing that is probably not clear for my sample (bummer): I have no assurance, that breakpoints are in the same order that they appear in the sequence. So, if you change {1,7} in the first row to {7,1}, it should still return my proposed result. I will try to adapt your solution to this.
@rouen: Ok, I might have a way to fix that. One question though, can breakpoints be duplicated? I.e breakpoints = Array[1, 1, 7] with 1 appearing twice in points (and being 'consumed' the first time you encounter it).
Nope, no duplicates in breakpoints. Also no duplicates possible in original sequences, if that matters :)
You are SQL beast, man! Thank you. Now for the interesting part: benchmarking this on millions of rows :)
Hey, just to let you know: it works really well on real dataset. It splits ~700k sequences in about 5 seconds on pretty average server :)
|
1

I think this does what you want:

select t.id,
       array_agg(point order by point)
from t cross join
     unnest(points) point cross join lateral
     (select lag(breakpoint) over (order by breakpoint) as prev_breakpoint, breakpoint
      from unnest(t.breakpoints) breakpoint
      union all
      select max(breakpoint), null
      from unnest(t.breakpoints) breakpoint
     ) b
where (point >= prev_breakpoint or prev_breakpoint is null) and
      (point <= breakpoint or breakpoint is null)
group by t.id, breakpoint;

Here is a db<>fiddle.

EDIT:

Here is the revised code for solving your actual problem:

select id, grp,
       (case when lead(max(breakpoint)) over (partition by id order by grp) is not null
             then array_agg(point order by n) || lead(max(breakpoint)) over (partition by id order by grp)
             else array_agg(point order by n)
        end) as next_breakpoint
from (select t.id, p.*, breakpoint,
             count(breakpoint) over (partition by t.id order by p.n) as grp
      from t cross join
           unnest(points) with ordinality as p(point, n) left join
           unnest(breakpoints) as breakpoint
           on p.point = breakpoint
     ) t
group by id, grp;

This has been included in the db<>fidde.

The idea is pretty simply. Just return the position of each point and match to the breakpoints. Then a window function is used to define the groups.

The only complication is on the aggregation. You want the break point in both records. So that requires some manipulation. I think the array manipulation with lead() is simpler than alternatives.

2 Comments

It does not. Maybe I should clarify that order of elements is important, I can not "sort" them. I will add that into question.
@rouen . . . Well, I like my first answer, even if it is to the wrong question. You can always ask another question. More seriously, I updated the answer with a similar approach for your actual problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.