6

table1

date_time               | make   | model | miles | reg_no | age_months
----------------------------------------------------------------------
2016-09-28 20:05:03.001 | toyota | prius | 10200 | 1111   | 22
2016-09-28 20:06:03.001 | suzuki | sx4   | 10300 | 1122   | 12
2016-09-28 20:09:03.001 | suzuki | sx4   | 11200 | 1133   | 34
2016-09-28 20:10:03.001 | toyota | prius | 15200 | 1144   | 28
2017-05-28 20:11:03.001 | toyota | prius | 15500 | 1144   | 36

For data in table1 above I wanted do some aggregations on miles per month by model like mean, median, q1, q3, iqr etc.

The query I've is as follows, but it gives error: aggregate functions cannot be nested - what is the right way to go about this ?

select
    model
    , COUNT(DISTINCT reg_no) AS distinct_car_count
    , COUNT(*) AS records_count
    , ROUND(AVG(miles/age_months*1.0),2) AS miles_per_month_avg
    , ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_med
    , ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q1
    , ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q3
    , miles_per_month_q3 - miles_per_month_q1 as miles_per_month_iqr
    , sum(case when miles/age_months*1.0 <  (miles_per_month_q1 - 1.5*miles_per_month_iqr) then 1 else 0 end) as miles_per_month_num_records_outliers_lower_bound
    , sum(case when miles/age_months*1.0 >  (miles_per_month_q3 + 1.5*miles_per_month_iqr) then 1 else 0 end) as miles_per_month_records_outliers_upper_bound
    , ROUND(stddev_pop(miles/age_months*1.0),2) as miles_per_month_stddev

from table1 a
group by model;
3
  • You need to move the three lines starting with , miles_per_month_q into another level. Commented Oct 24, 2016 at 17:27
  • @dnoeth how do we do that ? Commented Oct 25, 2016 at 0:55
  • Possible duplicate of Nesting Aggregate Functions - SQL Commented Jul 15, 2017 at 6:25

2 Answers 2

4

There are two issues:

#1: You can't nest aggregates (as the error message clearly indicates), miles_per_month_q1 is an aggregated column and yu try to use it in another aggregate miles_per_month_num_records_outliers_lower_bound.

#2: You try to reuse a column alias miles_per_month_q1 within the calculation of miles_per_month_iqr, which is not allowed in Standard SQL.

For both cases you need to add another nesting level (i.e. Derived Table or Common Table Expression), in your case it's probably:

SELECT
    a.model
    , Count(DISTINCT reg_no) AS distinct_car_count
    , Count(*) AS records_count
    , Round(Avg(miles/age_months*1.0),2) AS miles_per_month_avg

    -- now you can use the aliases, but you have to add a dummy (it's always the same value for a given model) aggregation function like MIN or MAX    
    , Min(percentiles.miles_per_month_med)
    , Min(percentiles.miles_per_month_q1)
    , Min(percentiles.miles_per_month_q3)
    , Min(percentiles.miles_per_month_q3 - percentiles.miles_per_month_q1) AS miles_per_month_iqr

    -- now it's no more nested aggregation
    , Sum(CASE WHEN miles/age_months*1.0 <  (percentiles.miles_per_month_q1 - 1.5* (percentiles.miles_per_month_q3 - percentiles.miles_per_month_q1)) THEN 1 ELSE 0 end) AS miles_per_month_num_records_outliers_lower_bound
    , Sum(CASE WHEN miles/age_months*1.0 >  (percentiles.miles_per_month_q3 + 1.5* (percentiles.miles_per_month_q3 - percentiles.miles_per_month_q1)) THEN 1 ELSE 0 end) AS miles_per_month_records_outliers_upper_bound

    , Round(StdDev_Pop(miles/age_months*1.0),2) AS miles_per_month_stddev

FROM table1 a
JOIN
 ( -- calculate the nested aggregates first
   SELECT
       model
       , Round(Percentile_Cont(0.5) Within GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_med
       , Round(Percentile_Cont(0.25) Within GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q1
       , Round(Percentile_Cont(0.75) Within GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q3
   FROM table1 a
   GROUP BY model
 ) AS percentiles
ON a.model = percentiles.model
GROUP BY a.model
Sign up to request clarification or add additional context in comments.

Comments

1

This is what is killing you:

, sum(case when miles/age_months*1.0 <  (miles_per_month_q1 - 1.5*miles_per_month_iqr) then 1 else 0 end) as miles_per_month_num_records_outliers_lower_bound
, sum(case when miles/age_months*1.0 >  (miles_per_month_q3 + 1.5*miles_per_month_iqr) then 1 else 0 end) as miles_per_month_records_outliers_upper_bound

You are using SUM on expressions (miles_per_month_q1,miles_per_month_q3) that are built upon another aggregative function - PERCENTILE_CONT

,ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q1
, ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_q3

Split the code to inner query with PERCENTILE_CONT wrraped by an outer query with SUM


suggested solution

select
    model
    , COUNT(DISTINCT reg_no) AS distinct_car_count
    , COUNT(*) AS records_count
    , ROUND(AVG(miles/age_months*1.0),2) AS miles_per_month_avg
    , ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) AS miles_per_month_med
    , min (miles_per_month_q1)  as miles_per_month_q1
    , min (miles_per_month_q3)  as miles_per_month_q3
    , miles_per_month_q3 - miles_per_month_q1 as miles_per_month_iqr
    , sum(case when miles/age_months*1.0 <  (miles_per_month_q1 - 1.5*(miles_per_month_q3 - miles_per_month_q1)) then 1 else 0 end) as miles_per_month_num_records_outliers_lower_bound
    , sum(case when miles/age_months*1.0 >  (miles_per_month_q3 + 1.5*(miles_per_month_q3 - miles_per_month_q1)) then 1 else 0 end) as miles_per_month_records_outliers_upper_bound
    , ROUND(stddev_pop(miles/age_months*1.0),2) as miles_per_month_stddev

from    (select       a.*
                    , ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) over (partition by model) AS miles_per_month_q1
                    , ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY (miles/age_months*1.0) ASC),2) over (partition by model) AS miles_per_month_q3

        from        table1 a
        ) a
        
group by  model
;

2 Comments

can you add some more direction for 'Split the code to inner query with PERCENTILE_CONT wrraped by an outer query with SUM'
@user3206440, please check the suggested solution I added to the answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.