I have a table called device_data that looks like below
Column | Type | Collation | Nullable | Default
----------------+-----------------------------+-----------+----------+-------------------------------------------------
id | integer | | | nextval('device_data_id_seq'::regclass)
date | timestamp without time zone | | |
packet_drop | real | | |
jitter | real | | |
latency | real | | |
alert | character varying(50) | | |
It basically stores the packet drops, jitter, latency on a minute basis.
Now there is another column called alert that holds the value HIGH, MEDIUM and LOW based on a threshold and holds empty string if the threshold is not met.
Now I do an hourly average on the table, my query looks like below
SELECT date_trunc('hour', date) AS hourly,
avg(jitter),
avg(latency),
avg(packet_drop)
FROM device_data
WHERE date BETWEEN '2022-11-26 17:41:11' AND '2022-11-27 17:41:11'
GROUP BY hourly
ORDER BY hourly;
Output
hourly | avg | avg | avg
---------------------+--------------------+--------------------+---------------------
2022-11-26 17:00:00 | 3.52857138642243 | 2.771428568022592 | 0
2022-11-26 18:00:00 | 2.484615419346553 | 2.815384602546692 | 0
2022-11-26 19:00:00 | 2.218461540570626 | 2.723076921242934 | 0
2022-11-26 20:00:00 | 5.098461512992015 | 2.7076923021903405 | 0
2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 | 0
2022-11-26 22:00:00 | 5.672307815345434 | 2.810769222332881 | 0
2022-11-26 23:00:00 | 2.7828124976949766 | 2.893749985843897 | 0
2022-11-27 00:00:00 | 2.6046153992414474 | 2.8030769238105187 | 0
2022-11-27 01:00:00 | 3.846031717837803 | 2.8333333200878568 | 0
...
(25 rows)
Next I do an hourly average on the table with additional column alert, so my query looks something like below
SELECT date_trunc('hour', date) as hourly,
avg(jitter),
avg(latency),
avg(packet_drop),
alert from device_data
WHERE date BETWEEN '2022-11-26 17:41:11' AND '2022-11-27 17:41:11'
GROUP BY hourly, alert
ORDER BY hourly;
Output
hourly | avg | avg | avg | alert
---------------------+--------------------+--------------------+---------------------+----------------
2022-11-26 17:00:00 | 1.2649999938905239 | 2.755000001192093 | 0 |
2022-11-26 17:00:00 | 48.79999923706055 | 3.0999999046325684 | 0 | MEDIUM
2022-11-26 18:00:00 | 2.484615419346553 | 2.815384602546692 | 0 |
2022-11-26 19:00:00 | 2.218461540570626 | 2.723076921242934 | 0 |
2022-11-26 20:00:00 | 2.6603174461495307 | 2.695238092589 | 0 |
2022-11-26 20:00:00 | 106.5999984741211 | 3.0999999046325684 | 0 | HIGH
2022-11-26 20:00:00 | 57.20000076293945 | 3.0999999046325684 | 0 | MEDIUM
2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 | 0 |
2022-11-26 22:00:00 | 2.8349206649831364 | 2.793650784189739 | 0 |
2022-11-26 22:00:00 | 95.05000305175781 | 3.350000023841858 | 0 | HIGH
2022-11-26 23:00:00 | 2.7828124976949766 | 2.893749985843897 | 0 |
2022-11-27 00:00:00 | 2.132812526775524 | 2.796875 | 0 |
2022-11-27 00:00:00 | 32.79999923706055 | 3.200000047683716 | 0 | LOW
2022-11-27 01:00:00 | 1.849999995962266 | 2.822580631702177 | 0 |
2022-11-27 01:00:00 | 127.5999984741211 | 3.5 | 0 | HIGH
...
(35 rows)
As you can see in my second query, some of the rows have duplicate timestamps like 2022-11-26 17:00:00, 2022-11-26 20:00:00 and so on.
I understand that when averaging out the columns, in the case of alert column, it encounters multiple values be it empty string along with some HIGH, MEDIUM or LOW strings or some other combinations resulting in multiple rows for the same timestamp.
Requirement
What I want is to simply average out the values (initially ignoring the alert column values) and then check for HIGH string in the alert column for that particular hour (I am doing hourly average). If the string is present, simply assign HIGH for that row in the alert column. If no HIGH string found, just assign empty string for that row in the alert column. Something like below
Expected output
hourly | avg | avg | avg | alert
---------------------+--------------------+--------------------+---------------------+----------------
2022-11-26 17:00:00 | 3.52857138642243 | 2.771428568022592 | 0 |
2022-11-26 18:00:00 | 2.484615419346553 | 2.815384602546692 | 0 |
2022-11-26 19:00:00 | 2.218461540570626 | 2.723076921242934 | 0 |
2022-11-26 20:00:00 | 5.098461512992015 | 2.7076923021903405 | 0 | HIGH
2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 | 0 |
2022-11-26 22:00:00 | 5.672307815345434 | 2.810769222332881 | 0 | HIGH
2022-11-26 23:00:00 | 2.7828124976949766 | 2.893749985843897 | 0 |
2022-11-27 00:00:00 | 2.6046153992414474 | 2.8030769238105187 | 0 |
2022-11-27 01:00:00 | 3.846031717837803 | 2.8333333200878568 | 0 | HIGH
...
(25 rows)
I thought of only targeting rows containing HIGH value in alert column by doing something like where alert = 'HIGH', but then it will drop other rows and affect my average calculation.
How can I do this?