How to do conditional aggregate based on values of another column?

Question

I have a table called device_data that looks like below

     Column     |            Type             | Collation | Nullable |                     Default                     
----------------+-----------------------------+-----------+----------+-------------------------------------------------
 id             | integer                     |           |          | nextval('device_data_id_seq'::regclass)
 date           | timestamp without time zone |           |          | 
 packet_drop    | real                        |           |          | 
 jitter         | real                        |           |          | 
 latency        | real                        |           |          | 
 alert          | character varying(50)       |           |          |

It basically stores the packet drops, jitter, latency on a minute basis.

Now there is another column called alert that holds the value HIGH, MEDIUM and LOW based on a threshold and holds empty string if the threshold is not met.

Now I do an hourly average on the table, my query looks like below

SELECT date_trunc('hour', date) AS hourly,
    avg(jitter),
    avg(latency),
    avg(packet_drop)
FROM device_data
WHERE date BETWEEN '2022-11-26 17:41:11' AND '2022-11-27 17:41:11'
GROUP BY hourly 
ORDER BY hourly;

Output

       hourly        |        avg         |        avg         |         avg         
---------------------+--------------------+--------------------+---------------------
 2022-11-26 17:00:00 |   3.52857138642243 |  2.771428568022592 |                   0
 2022-11-26 18:00:00 |  2.484615419346553 |  2.815384602546692 |                   0
 2022-11-26 19:00:00 |  2.218461540570626 |  2.723076921242934 |                   0
 2022-11-26 20:00:00 |  5.098461512992015 | 2.7076923021903405 |                   0
 2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 |                   0
 2022-11-26 22:00:00 |  5.672307815345434 |  2.810769222332881 |                   0
 2022-11-26 23:00:00 | 2.7828124976949766 |  2.893749985843897 |                   0
 2022-11-27 00:00:00 | 2.6046153992414474 | 2.8030769238105187 |                   0
 2022-11-27 01:00:00 |  3.846031717837803 | 2.8333333200878568 |                   0
 ...
(25 rows)

Next I do an hourly average on the table with additional column alert, so my query looks something like below

SELECT date_trunc('hour', date) as hourly,
    avg(jitter),
    avg(latency),
    avg(packet_drop),
    alert from device_data
WHERE date BETWEEN '2022-11-26 17:41:11' AND '2022-11-27 17:41:11'
GROUP BY hourly, alert
ORDER BY hourly;

Output

       hourly        |        avg         |        avg         |         avg         | alert 
---------------------+--------------------+--------------------+---------------------+----------------
 2022-11-26 17:00:00 | 1.2649999938905239 |  2.755000001192093 |                   0 | 
 2022-11-26 17:00:00 |  48.79999923706055 | 3.0999999046325684 |                   0 | MEDIUM
 2022-11-26 18:00:00 |  2.484615419346553 |  2.815384602546692 |                   0 | 
 2022-11-26 19:00:00 |  2.218461540570626 |  2.723076921242934 |                   0 | 
 2022-11-26 20:00:00 | 2.6603174461495307 |     2.695238092589 |                   0 | 
 2022-11-26 20:00:00 |  106.5999984741211 | 3.0999999046325684 |                   0 | HIGH
 2022-11-26 20:00:00 |  57.20000076293945 | 3.0999999046325684 |                   0 | MEDIUM
 2022-11-26 21:00:00 | 2.0060606116824076 | 2.6469696814363655 |                   0 | 
 2022-11-26 22:00:00 | 2.8349206649831364 |  2.793650784189739 |                   0 | 
 2022-11-26 22:00:00 |  95.05000305175781 |  3.350000023841858 |                   0 | HIGH
 2022-11-26 23:00:00 | 2.7828124976949766 |  2.893749985843897 |                   0 | 
 2022-11-27 00:00:00 |  2.132812526775524 |           2.796875 |                   0 | 
 2022-11-27 00:00:00 |  32.79999923706055 |  3.200000047683716 |                   0 | LOW
 2022-11-27 01:00:00 |  1.849999995962266 |  2.822580631702177 |                   0 | 
 2022-11-27 01:00:00 |  127.5999984741211 |                3.5 |                   0 | HIGH
 ...
(35 rows)

As you can see in my second query, some of the rows have duplicate timestamps like 2022-11-26 17:00:00, 2022-11-26 20:00:00 and so on.

I understand that when averaging out the columns, in the case of alert column, it encounters multiple values be it empty string along with some HIGH, MEDIUM or LOW strings or some other combinations resulting in multiple rows for the same timestamp.

Requirement

What I want is to simply average out the values (initially ignoring the alert column values) and then check for HIGH string in the alert column for that particular hour (I am doing hourly average). If the string is present, simply assign HIGH for that row in the alert column. If no HIGH string found, just assign empty string for that row in the alert column. Something like below

Expected output

       hourly        |        avg         |        avg         |         avg         | alert 
---------------------+--------------------+--------------------+---------------------+----------------
 2022-11-26 17:00:00 |  3.52857138642243   |  2.771428568022592 |                   0 | 
 2022-11-26 18:00:00 |  2.484615419346553  |  2.815384602546692 |                   0 | 
 2022-11-26 19:00:00 |  2.218461540570626  |  2.723076921242934 |                   0 | 
 2022-11-26 20:00:00 |  5.098461512992015  | 2.7076923021903405 |                   0 |  HIGH
 2022-11-26 21:00:00 |  2.0060606116824076 | 2.6469696814363655 |                   0 | 
 2022-11-26 22:00:00 |  5.672307815345434  |  2.810769222332881 |                   0 |  HIGH
 2022-11-26 23:00:00 |  2.7828124976949766 |  2.893749985843897 |                   0 | 
 2022-11-27 00:00:00 |  2.6046153992414474 | 2.8030769238105187 |                   0 | 
 2022-11-27 01:00:00 |  3.846031717837803  | 2.8333333200878568 |                   0 |  HIGH
 ...
(25 rows)

I thought of only targeting rows containing HIGH value in alert column by doing something like where alert = 'HIGH', but then it will drop other rows and affect my average calculation.

How can I do this?

Souvik Ray · Accepted Answer · 2022-11-28 04:35:07Z

3

You need to use an aggregate function for the column 'alert', as well. Since you want to only bring up a certain value, you can use a conditional expression inside the aggregate, like this:

select 
     date_trunc('hour', date) as hourly
   , avg(jitter)
   , avg(latency)
   , avg(packet_drop)
   , coalesce(MAX(case when alert='HIGH' then 'HIGH' end),'') as alert 
from device_data 
where date between '2022-11-26 17:41:11' and '2022-11-27 17:41:11' 
group by hourly order by hourly;

This should work, because the MAX function ignores the NULL values (all alert values other than HIGH will cause NULL result in the case expression.

edited Nov 28, 2022 at 4:35

Souvik Ray

3,0567 gold badges46 silver badges85 bronze badges

answered Nov 27, 2022 at 22:04

tinazmu

5,5442 gold badges12 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Souvik Ray Over a year ago

Btw removed alert from groupby otherwise it was throwing errors since aggregate can't be grouped by.

Collectives™ on Stack Overflow

How to do conditional aggregate based on values of another column?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related