I've got a postgres database in which I've got a jsonb field called details:
timestamp | sensor | details
-------------------------------------------
2018-10-10 08:00:00+01 | A | [{"result": 1.0, "direction": "up"}, {"result": 2.0, "direction": "up"}]
2018-10-10 09:01:00+01 | A | [{"result": 1.0, "direction": "up"}, {"result": 2.0, "direction": "up"}]
2018-10-10 09:01:00+01 | B | [{"result": 3.0, "direction": "up"}, {"result": 4.0, "direction": "down"}]
2018-10-10 09:01:00+01 | B | [{"result": 5.0, "direction": "up"}, {"result": 6.0, "direction": "up"}, {"result": 7.0, "direction": "down"}]
2018-10-10 09:01:00+01 | A | [{"result": 8.0, "direction": "down"}, {"result": 9.0, "direction": "left"}, {"result": 10.0, "direction": "down"}]
I had to sum up the result of all the up records and down records per 15 minutes and grouped by sensor.
So with the help of this answer I got to this query:
SELECT
CASE -- round by every 15 minutes
WHEN extract('minute' from timestamp) < 15 THEN date_trunc('hour', timestamp)
WHEN extract('minute' from timestamp) < 30 THEN date_trunc('hour', timestamp) + '15 minutes'
WHEN extract('minute' from timestamp) < 45 THEN date_trunc('hour', timestamp) + '30 minutes'
ELSE date_trunc('hour', timestamp) + '45 minutes'
END AS timestamp_rounded,
sensor,
SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'up') AS up_sum,
SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'down') AS down_sum,
count(sensor) as record_count
FROM
mytable,
jsonb_array_elements(details) detail_elems
GROUP BY
timestamp_rounded,
sensor
ORDER BY
timestamp_rounded ASC,
sensor ASC
;
Which results in this:
timestamp_rounded | sensor | up_sum | down_sum | record_count
------------------------------------------------------------------
2018-10-10 08:00:00+01 | A | 3.0 | | 2 -- expected 1
2018-10-10 09:00:00+01 | A | 3.0 | 18.0 | 5 -- expected 2
2018-10-10 09:00:00+01 | B | 14.0 | 11.0 | 5 -- expected 2
The sums are correct, but unfortunately the record_count is incorrect. I expect it to show the count of the records from the original table that are used per line. Instead, it shows the number of records from the details field that are used for that line. I added a comment behind it to show what numbers I expect.
Does anybody know why I get the records in the detail_elems instead of the mytable?
[EDIT]
So I now have it working, but with a Subquery (yikes!):
WITH tmp AS (
SELECT
*,
CASE -- round by every 15 minutes
WHEN extract('minute' from timestamp) < 15 THEN date_trunc('hour', timestamp)
WHEN extract('minute' from timestamp) < 30 THEN date_trunc('hour', timestamp) + '15 minutes'
WHEN extract('minute' from timestamp) < 45 THEN date_trunc('hour', timestamp) + '30 minutes'
ELSE date_trunc('hour', timestamp) + '45 minutes'
END AS timestamp_rounded
FROM
mytable
)
SELECT
timestamp_rounded,
sensor,
SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'up') AS up_sum,
SUM((detail_elems ->> 'result')::numeric) FILTER (WHERE detail_elems ->> 'direction' = 'down') AS down_sum,
(SELECT count(*) from tmp tmp2 where tmp2.timestamp_rounded=tmp.timestamp_rounded and tmp2.sensor=tmp.sensor) as record_count_correct
-- COUNT(*) OVER (PARTITION BY sensor, extract(year from timestamp_rounded), extract(month from timestamp_rounded), extract(day from timestamp_rounded), extract(hour from timestamp_rounded), extract(minute from timestamp_rounded)) AS record_count_incorrect
FROM
tmp,
jsonb_array_elements(details) detail_elems
GROUP BY
timestamp_rounded,
sensor
ORDER BY
timestamp_rounded ASC,
sensor ASC
This works, but unfortunately it is very slow (10 minutes on an export from our production database running on my laptop). I did some profiling and the subquery is clearly the problem (without it, it takes about 6 seconds).
So I tried it using the COUNT(*) OVER (PARTITION BY pattern (also in the query above) in which I use the sensor and the timestamp to partition by, but that gives incorrect results.
Here's a fiddle of my current state: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=65074f510de678b2a60255483bdb0be4
Any ideas how to correctly implement the PARTITION BY in this current state?