0

I have a user who has done a sequence of events. I want to capture the number of times they have done each event and in which order.

So for the following table user_events:

name  eventname  time    
Ted   a          12:01
Ted   b          12:02
Ted   b          12:03
Ted   b          12:04
Ted   c          12:05
Ted   b          12:06
Ted   b          12:07
Ted   c          12:08
Ted   b          12:09
Ted   b          12:11
Ted   b          12:12

I should get:

name  eventname  event_sequence_number  time_started  frequency
Ted   a          1                      12:01         1
Ted   b          2                      12:02         3
Ted   c          3                      12:05         1
Ted   b          4                      12:06         2
Ted   c          5                      12:08         1
Ted   b          6                      12:09         3

I've been attempting with rank(), dense_rank(), row_number() and lag() but can't put it all together. Any ideas?

1 Answer 1

1

Try this. It uses the Tabibitosan method ( grouping sequence ranges ) : Toolbox

SQL Fiddle

PostgreSQL 9.6 Schema Setup:

CREATE TABLE user_events
    (user_name varchar(3), eventname varchar(1), event_time time)
;

INSERT INTO user_events
    (user_name, eventname, event_time)
VALUES
    ('Ted', 'a', '12:01'),
    ('Ted', 'b', '12:02'),
    ('Ted', 'b', '12:03'),
    ('Ted', 'b', '12:04'),
    ('Ted', 'c', '12:05'),
    ('Ted', 'b', '12:06'),
    ('Ted', 'b', '12:07'),
    ('Ted', 'c', '12:08'),
    ('Ted', 'b', '12:09'),
    ('Ted', 'b', '12:11'),
    ('Ted', 'b', '12:12')
;

Query 1:

SELECT t.user_name
    ,t.eventname
    ,row_number() OVER (
        ORDER BY MIN(event_time)
        ) AS event_sequence_number
    ,MIN(event_time) AS time_started
    ,COUNT(*) as frequency
FROM (
    SELECT user_name
        ,eventname
        ,event_time
        ,row_number() OVER (
            ORDER BY event_time
            ) - row_number() OVER (
            PARTITION BY eventname ORDER BY event_time
                ,eventname
            )  seq
    FROM user_events
    ) t
GROUP BY user_name
    ,eventname
    ,seq
ORDER BY time_started

Results:

| user_name | eventname | event_sequence_number | time_started | frequency |
|-----------|-----------|-----------------------|--------------|-----------|
|       Ted |         a |                     1 |     12:01:00 |         1 |
|       Ted |         b |                     2 |     12:02:00 |         3 |
|       Ted |         c |                     3 |     12:05:00 |         1 |
|       Ted |         b |                     4 |     12:06:00 |         2 |
|       Ted |         c |                     5 |     12:08:00 |         1 |
|       Ted |         b |                     6 |     12:09:00 |         3 |
Sign up to request clarification or add additional context in comments.

1 Comment

Beautiful piece of code - I had created something similar in the end. Extremely useful for creating a cleaner path analysis from aggregating millions of online user behaviour.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.