1

I'm not new to SQL, but I am new to PostgreSQL and am really struggling to adapt my current knowledge in a different environment.

I am trying to create a variable that captures whether or not someone stays active, skips, or churns within a 0/1 time series variable. For example, in the data below, my dataset would include the variables id,time, and voted, and I would create the variable "skipped":

id     time     voted     skipped
1      1        1         active
1      2        0         skipped
1      3        1         active
2      1        1         active
2      2        0         churned
2      3        0         churned
3      1        1         active
3      2        1         active
3      3        0         churned

The rule for coding "skipped" is pretty simple: If 1 is the last record, the person is "active" and any zeroes count as "skipped", but if 0 is the last record, the person is "churned".

The record with id = 1 is a skip because id is non-zero at time 3 after being 0 at time 2. The other two cases, 0 is the final value so they are "churned". Can anyone help? I've been noodling on it all day, and am hitting a wall.

2 Answers 2

3

This isn't particularly elegant, but it should meet your needs:

with votes as (
  select
    id, time, voted,
    max(time) over (partition by id) as max_time
  from voter_data
)
select
  v1.id, v1.time, v1.voted,
  case
    when v1.voted = 1 then 'active'
    when v2.voted = 1 then 'skipped'
    else 'churned'
  end as skipped
from
  votes v1
  join votes v2 on
    v1.id = v2.id and
    v1.max_time = v2.time

In a nutshell, we first figure out which is the last record for each voter id, and then we do a self-join on the resulting table to isolate only that last id.

There is a chance this could produce multiple results -- if it's possible to have the same ID vote twice at the same time. If that's the case, you want row_number() instead of max().

Results on your data:

1   1   1   'active'
1   2   0   'skipped'
1   3   1   'active'
2   1   1   'active'
2   2   0   'churned'
2   3   0   'churned'
3   1   1   'active'
3   2   1   'active'
3   3   0   'churned'
Sign up to request clarification or add additional context in comments.

Comments

1

Window functions can help for readability when working with self-referential joins.

WITH
  add_last_voted_status AS (
    SELECT
      *
    , LAST_VALUE(voted) OVER (
        PARTITION BY id
        ORDER BY time
      ) AS last_voted_status
    FROM table
)

SELECT
  id
, time
, voted
, CASE
    WHEN last_voted_status = 0
      THEN 'churned'
    WHEN last_voted_status = 1 AND voted = 1
      THEN 'active'
    WHEN last_voted_status = 1 AND voted = 0
      THEN 'skipped'
    ELSE '?'
  END AS skipped
FROM add_last_voted_status

1 Comment

last_value! That's the one I was trying to remember... nicely done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.