How to show count of single table when using FROM two tables in PostgreSQL?

Question

I've got a postgres database in which I've got a jsonb field called details:

timestamp              | sensor | details
-------------------------------------------
2018-10-10 08:00:00+01 | A      | [{"result": 1.0, "direction": "up"}, {"result": 2.0, "direction": "up"}]
2018-10-10 09:01:00+01 | A      | [{"result": 1.0, "direction": "up"}, {"result": 2.0, "direction": "up"}]
2018-10-10 09:01:00+01 | B      | [{"result": 3.0, "direction": "up"}, {"result": 4.0, "direction": "down"}]
2018-10-10 09:01:00+01 | B      | [{"result": 5.0, "direction": "up"}, {"result": 6.0, "direction": "up"}, {"result": 7.0, "direction": "down"}]
2018-10-10 09:01:00+01 | A      | [{"result": 8.0, "direction": "down"}, {"result": 9.0, "direction": "left"}, {"result": 10.0, "direction": "down"}]

I had to sum up the result of all the up records and down records per 15 minutes and grouped by sensor.

So with the help of this answer I got to this query:

SELECT
    CASE -- round by every 15 minutes
        WHEN extract('minute' from timestamp) < 15 THEN date_trunc('hour', timestamp)
        WHEN extract('minute' from timestamp) < 30 THEN date_trunc('hour', timestamp) + '15 minutes'
        WHEN extract('minute' from timestamp) < 45 THEN date_trunc('hour', timestamp) + '30 minutes'
        ELSE date_trunc('hour', timestamp) + '45 minutes'
    END AS timestamp_rounded,
    sensor,
    SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'up') AS up_sum,
    SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'down') AS down_sum,
    count(sensor) as record_count
FROM
    mytable,
    jsonb_array_elements(details) detail_elems
GROUP BY
    timestamp_rounded,
    sensor
ORDER BY 
    timestamp_rounded ASC,
    sensor ASC
;

Which results in this:

timestamp_rounded      | sensor | up_sum | down_sum | record_count
------------------------------------------------------------------
2018-10-10 08:00:00+01 | A      | 3.0    |          | 2  -- expected 1
2018-10-10 09:00:00+01 | A      | 3.0    | 18.0     | 5  -- expected 2
2018-10-10 09:00:00+01 | B      | 14.0   | 11.0     | 5  -- expected 2

The sums are correct, but unfortunately the record_count is incorrect. I expect it to show the count of the records from the original table that are used per line. Instead, it shows the number of records from the details field that are used for that line. I added a comment behind it to show what numbers I expect.

Does anybody know why I get the records in the detail_elems instead of the mytable?

[EDIT]

So I now have it working, but with a Subquery (yikes!):

WITH tmp AS (
SELECT
    *,
    CASE -- round by every 15 minutes
        WHEN extract('minute' from timestamp) < 15 THEN date_trunc('hour', timestamp)
        WHEN extract('minute' from timestamp) < 30 THEN date_trunc('hour', timestamp) + '15 minutes'
        WHEN extract('minute' from timestamp) < 45 THEN date_trunc('hour', timestamp) + '30 minutes'
        ELSE date_trunc('hour', timestamp) + '45 minutes'
    END AS timestamp_rounded
FROM
    mytable
)
SELECT 
    timestamp_rounded,
    sensor,
    SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'up') AS up_sum,
    SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'down') AS down_sum,
    (SELECT count(*) from tmp tmp2 where tmp2.timestamp_rounded=tmp.timestamp_rounded and tmp2.sensor=tmp.sensor) as record_count_correct
    -- COUNT(*) OVER (PARTITION BY sensor, extract(year from timestamp_rounded), extract(month from timestamp_rounded), extract(day from timestamp_rounded), extract(hour from timestamp_rounded), extract(minute from timestamp_rounded)) AS record_count_incorrect
FROM 
    tmp,
    jsonb_array_elements(details) detail_elems
GROUP BY
    timestamp_rounded,
    sensor
ORDER BY 
    timestamp_rounded ASC,
    sensor ASC

This works, but unfortunately it is very slow (10 minutes on an export from our production database running on my laptop). I did some profiling and the subquery is clearly the problem (without it, it takes about 6 seconds).

So I tried it using the COUNT(*) OVER (PARTITION BY pattern (also in the query above) in which I use the sensor and the timestamp to partition by, but that gives incorrect results.

Here's a fiddle of my current state: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=65074f510de678b2a60255483bdb0be4

Any ideas how to correctly implement the PARTITION BY in this current state?

S-Man · Accepted Answer · 2019-09-26 15:11:41Z

1

demo:db<>fiddle

Because you want to count the records of the original, unexpanded version, you should do the count first (e.g. in a subquery using a COUNT() window function)

SELECT 
    sensor,
    SUM((elems ->> 'result')::numeric)
        FILTER (WHERE elems ->> 'direction' = 'up') AS up_sum,
    SUM((elems ->> 'result')::numeric) 
        FILTER (WHERE elems ->> 'direction' = 'down') AS down_sum,
    cnt
FROM (
    SELECT
        *,
        COUNT(*) OVER (PARTITION BY sensor) AS cnt
    FROM
        mytable   
    )s,
    jsonb_array_elements(details) elems
GROUP BY sensor, cnt

Edit: Question added time slots.

So, the timeslots can be used for creating partitions:

demo:db<>fiddle

SELECT 
    sensor,
    to_timestamp(time_slot * 60 * 15) as time_slot,
    SUM((elems ->> 'result')::numeric)
        FILTER (WHERE elems ->> 'direction' = 'up') AS up_sum,
    SUM((elems ->> 'result')::numeric) 
        FILTER (WHERE elems ->> 'direction' = 'down') AS down_sum,
    cnt
FROM (
    SELECT
        *,
        EXTRACT(epoch from "timestamp")::int / 60 / 15 as time_slot,
        COUNT(*) OVER (PARTITION BY EXTRACT(epoch from "timestamp")::int / 60 / 15, sensor) AS cnt
    FROM
        mytable   
    )s,
    jsonb_array_elements(details) elems
GROUP BY sensor, time_slot, cnt

edited Sep 26, 2019 at 15:11

answered Sep 25, 2019 at 14:51

S-Man

24k9 gold badges51 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

kramer65 Over a year ago

Hi S-Man. It took me some time to fully comprehend what was going on and to fit it in the larger (humongous) query I have. But I succeeded and it works perfect! The dbfiddle thing is really great by the way! I'm now using it more as well. Thanks!

S-Man Over a year ago

@kramer65 Thanks for upvote. I would appreciate if you could accept the answer as well to close it. :)

kramer65 Over a year ago

Ok, in the end it didn't work out. Simply because I didn't test enough and I didn't give you the full picture. The thing is that I also need to PARTITION BY the date which I group per 15 minutes. I got it working with a subquery, but that is very slow. And I can't get PARTITION BY working with the rounded timestamp either. I rewrote the question with some more elaborate examples and I added a fiddle. I understand that it kinda became a different question, but it would be really awesome if you could have another short look.

kramer65 Over a year ago

Works like a charm, and cut the query time on production data from 10 minutes to 6 seconds; awesome! Indeed, you solved it twice, plus I learned a lot on the way. You made my week!

Anthony Sotolongo · Accepted Answer · 2019-09-25 15:01:53Z

1

you get 5 and 5 by sensor because when you write :

FROM
mytable,
jsonb_array_elements(details) elems

and get the Cartesian product from both tables :

SELECT 
sensor,
elems.*
FROM
mytable,
jsonb_array_elements(details) elems; --5 and 5 by sensor

"A";"{"result": 1.0, "direction": "up"}"
"A";"{"result": 2.0, "direction": "up"}"
"B";"{"result": 3.0, "direction": "up"}"
"B";"{"result": 4.0, "direction": "down"}"
"B";"{"result": 5.0, "direction": "up"}"
"B";"{"result": 6.0, "direction": "up"}"
"B";"{"result": 7.0, "direction": "down"}"
"A";"{"result": 8.0, "direction": "down"}"
"A";"{"result": 9.0, "direction": "left"}"
"A";"{"result": 10.0, "direction": "down"}"

you can write your query with a subquery like this :

SELECT 
sensor,
SUM((elems ->> 'result')::numeric)
    FILTER (WHERE elems ->> 'direction' = 'up') AS up_sum,
SUM((elems ->> 'result')::numeric) 
    FILTER (WHERE elems ->> 'direction' = 'down') AS down_sum,
     (select count (*) from mytable i_myt where i_myt.sensor=mytable.sensor )  
     as record_count   --subquery to count

 FROM
mytable,
jsonb_array_elements(details) elems
GROUP BY sensor;



"B";14.0;11.0;2
"A";3.0;18.0;2

answered Sep 25, 2019 at 15:01

Anthony Sotolongo

1,6882 gold badges13 silver badges23 bronze badges

2 Comments

kramer65 Over a year ago

I did this, and it works, but unfortunately it's way too slow. Any ideas how to rewrite it so that it is more performant?

Anthony Sotolongo Over a year ago

you can add index in sensor column or use WITH clause where you can calculate the record_count , do you had use WITH clause sometime?

Collectives™ on Stack Overflow

How to show count of single table when using FROM two tables in PostgreSQL?

2 Answers 2

4 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related