0

I've got a postgres database in which I've got a jsonb field called details:

timestamp              | sensor | details
-------------------------------------------
2018-10-10 08:00:00+01 | A      | [{"result": 1.0, "direction": "up"}, {"result": 2.0, "direction": "up"}]
2018-10-10 09:01:00+01 | A      | [{"result": 1.0, "direction": "up"}, {"result": 2.0, "direction": "up"}]
2018-10-10 09:01:00+01 | B      | [{"result": 3.0, "direction": "up"}, {"result": 4.0, "direction": "down"}]
2018-10-10 09:01:00+01 | B      | [{"result": 5.0, "direction": "up"}, {"result": 6.0, "direction": "up"}, {"result": 7.0, "direction": "down"}]
2018-10-10 09:01:00+01 | A      | [{"result": 8.0, "direction": "down"}, {"result": 9.0, "direction": "left"}, {"result": 10.0, "direction": "down"}]

I had to sum up the result of all the up records and down records per 15 minutes and grouped by sensor.

So with the help of this answer I got to this query:

SELECT
    CASE -- round by every 15 minutes
        WHEN extract('minute' from timestamp) < 15 THEN date_trunc('hour', timestamp)
        WHEN extract('minute' from timestamp) < 30 THEN date_trunc('hour', timestamp) + '15 minutes'
        WHEN extract('minute' from timestamp) < 45 THEN date_trunc('hour', timestamp) + '30 minutes'
        ELSE date_trunc('hour', timestamp) + '45 minutes'
    END AS timestamp_rounded,
    sensor,
    SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'up') AS up_sum,
    SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'down') AS down_sum,
    count(sensor) as record_count
FROM
    mytable,
    jsonb_array_elements(details) detail_elems
GROUP BY
    timestamp_rounded,
    sensor
ORDER BY 
    timestamp_rounded ASC,
    sensor ASC
;

Which results in this:

timestamp_rounded      | sensor | up_sum | down_sum | record_count
------------------------------------------------------------------
2018-10-10 08:00:00+01 | A      | 3.0    |          | 2  -- expected 1
2018-10-10 09:00:00+01 | A      | 3.0    | 18.0     | 5  -- expected 2
2018-10-10 09:00:00+01 | B      | 14.0   | 11.0     | 5  -- expected 2

The sums are correct, but unfortunately the record_count is incorrect. I expect it to show the count of the records from the original table that are used per line. Instead, it shows the number of records from the details field that are used for that line. I added a comment behind it to show what numbers I expect.

Does anybody know why I get the records in the detail_elems instead of the mytable?

[EDIT]

So I now have it working, but with a Subquery (yikes!):

WITH tmp AS (
SELECT
    *,
    CASE -- round by every 15 minutes
        WHEN extract('minute' from timestamp) < 15 THEN date_trunc('hour', timestamp)
        WHEN extract('minute' from timestamp) < 30 THEN date_trunc('hour', timestamp) + '15 minutes'
        WHEN extract('minute' from timestamp) < 45 THEN date_trunc('hour', timestamp) + '30 minutes'
        ELSE date_trunc('hour', timestamp) + '45 minutes'
    END AS timestamp_rounded
FROM
    mytable
)
SELECT 
    timestamp_rounded,
    sensor,
    SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'up') AS up_sum,
    SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'down') AS down_sum,
    (SELECT count(*) from tmp tmp2 where tmp2.timestamp_rounded=tmp.timestamp_rounded and tmp2.sensor=tmp.sensor) as record_count_correct
    -- COUNT(*) OVER (PARTITION BY sensor, extract(year from timestamp_rounded), extract(month from timestamp_rounded), extract(day from timestamp_rounded), extract(hour from timestamp_rounded), extract(minute from timestamp_rounded)) AS record_count_incorrect
FROM 
    tmp,
    jsonb_array_elements(details) detail_elems
GROUP BY
    timestamp_rounded,
    sensor
ORDER BY 
    timestamp_rounded ASC,
    sensor ASC

This works, but unfortunately it is very slow (10 minutes on an export from our production database running on my laptop). I did some profiling and the subquery is clearly the problem (without it, it takes about 6 seconds).

So I tried it using the COUNT(*) OVER (PARTITION BY pattern (also in the query above) in which I use the sensor and the timestamp to partition by, but that gives incorrect results.

Here's a fiddle of my current state: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=65074f510de678b2a60255483bdb0be4

Any ideas how to correctly implement the PARTITION BY in this current state?

2 Answers 2

1

demo:db<>fiddle

Because you want to count the records of the original, unexpanded version, you should do the count first (e.g. in a subquery using a COUNT() window function)

SELECT 
    sensor,
    SUM((elems ->> 'result')::numeric)
        FILTER (WHERE elems ->> 'direction' = 'up') AS up_sum,
    SUM((elems ->> 'result')::numeric) 
        FILTER (WHERE elems ->> 'direction' = 'down') AS down_sum,
    cnt
FROM (
    SELECT
        *,
        COUNT(*) OVER (PARTITION BY sensor) AS cnt
    FROM
        mytable   
    )s,
    jsonb_array_elements(details) elems
GROUP BY sensor, cnt

Edit: Question added time slots.

So, the timeslots can be used for creating partitions:

demo:db<>fiddle

SELECT 
    sensor,
    to_timestamp(time_slot * 60 * 15) as time_slot,
    SUM((elems ->> 'result')::numeric)
        FILTER (WHERE elems ->> 'direction' = 'up') AS up_sum,
    SUM((elems ->> 'result')::numeric) 
        FILTER (WHERE elems ->> 'direction' = 'down') AS down_sum,
    cnt
FROM (
    SELECT
        *,
        EXTRACT(epoch from "timestamp")::int / 60 / 15 as time_slot,
        COUNT(*) OVER (PARTITION BY EXTRACT(epoch from "timestamp")::int / 60 / 15, sensor) AS cnt
    FROM
        mytable   
    )s,
    jsonb_array_elements(details) elems
GROUP BY sensor, time_slot, cnt
Sign up to request clarification or add additional context in comments.

4 Comments

Hi S-Man. It took me some time to fully comprehend what was going on and to fit it in the larger (humongous) query I have. But I succeeded and it works perfect! The dbfiddle thing is really great by the way! I'm now using it more as well. Thanks!
@kramer65 Thanks for upvote. I would appreciate if you could accept the answer as well to close it. :)
Ok, in the end it didn't work out. Simply because I didn't test enough and I didn't give you the full picture. The thing is that I also need to PARTITION BY the date which I group per 15 minutes. I got it working with a subquery, but that is very slow. And I can't get PARTITION BY working with the rounded timestamp either. I rewrote the question with some more elaborate examples and I added a fiddle. I understand that it kinda became a different question, but it would be really awesome if you could have another short look.
Works like a charm, and cut the query time on production data from 10 minutes to 6 seconds; awesome! Indeed, you solved it twice, plus I learned a lot on the way. You made my week!
1

you get 5 and 5 by sensor because when you write :

FROM
mytable,
jsonb_array_elements(details) elems

and get the Cartesian product from both tables :

SELECT 
sensor,
elems.*
FROM
mytable,
jsonb_array_elements(details) elems; --5 and 5 by sensor

"A";"{"result": 1.0, "direction": "up"}"
"A";"{"result": 2.0, "direction": "up"}"
"B";"{"result": 3.0, "direction": "up"}"
"B";"{"result": 4.0, "direction": "down"}"
"B";"{"result": 5.0, "direction": "up"}"
"B";"{"result": 6.0, "direction": "up"}"
"B";"{"result": 7.0, "direction": "down"}"
"A";"{"result": 8.0, "direction": "down"}"
"A";"{"result": 9.0, "direction": "left"}"
"A";"{"result": 10.0, "direction": "down"}"

you can write your query with a subquery like this :

SELECT 
sensor,
SUM((elems ->> 'result')::numeric)
    FILTER (WHERE elems ->> 'direction' = 'up') AS up_sum,
SUM((elems ->> 'result')::numeric) 
    FILTER (WHERE elems ->> 'direction' = 'down') AS down_sum,
     (select count (*) from mytable i_myt where i_myt.sensor=mytable.sensor )  
     as record_count   --subquery to count

 FROM
mytable,
jsonb_array_elements(details) elems
GROUP BY sensor;



"B";14.0;11.0;2
"A";3.0;18.0;2

2 Comments

I did this, and it works, but unfortunately it's way too slow. Any ideas how to rewrite it so that it is more performant?
you can add index in sensor column or use WITH clause where you can calculate the record_count , do you had use WITH clause sometime?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.