-3

Please help me with a request. I have two tables. The first table stores data on the user's use of the application for a certain period of time.
And I have a table of granules, which I compiled using a query on the first table.
Screen my tables:
enter image description here
enter image description here
Example value for field app_info:
enter image description here
But I need to get the resulting query, which for each granule will create an array of json for the programs used during the granule of time. I also need to group the programs by name or domain_site field and create an array of headers. For example, a user used the browser for 10 minutes, opening different tabs. We then need to get the following json:

            [
          {
            "app_name": "googlechrome",
            "path": "C:\\ProgramFiles\\google\\googlechrome.exe",
            "domain_site": "stackoverfloww.com",
            "count_seconds": 540,
            "titles": [
              {
    "count_seconds": 320,
                "url": "https://stackoverflow.com/questions/40978290/construct-json-object-from-query-with-group-by-sum",
                "title": "Construct json object from query with group by / sum"
              },
              {
    "count_seconds": 220,
                "url": "https://stackoverflow.com/questions/43117033/aggregate-function-calls-cannot-be-nested-postgresql",
                "title": "aggregate function calls cannot be nested postgresql"
              }
            ]
          }
        ]

And if the user used several applications in 10 minutes, then for each application such statistics. For each application, you need to count the number of seconds of its use.
I expect to get such a table:
enter image description here

I can't cope with such a request. Here is the request I wrote:

        select 
    employee_id,
    date,
    granula_start,
    granula_end,
    
    (select 
        array_agg(json_build_object('seconds', SUM( case 
                                          when end_time > granula_end 
                                          then ((EXTRACT(MINUTE FROM granula_end) - EXTRACT(MINUTE FROM start_time))*60) 
                                          else (case  when EXTRACT(MINUTE FROM start_time) = EXTRACT(MINUTE FROM end_time) 
                                                      then 
                                                       EXTRACT(SECOND FROM end_time) - EXTRACT(SECOND FROM start_time)
                                                      else
                                                       (EXTRACT(MINUTE FROM end_time) - EXTRACT(MINUTE FROM start_time)) * 60 
                                               end )
                                          end ),
                          'domain_site', m.app_info::jsonb->'domain_site',
                          'app_name', m.app_info::jsonb->'app_name',
                          'app_type', m.app_info::jsonb->'app_type' )) as app_info
     from pps.my_temp m
     where start_time >= granula_start and start_time <= granula_end
     group by m.app_info::jsonb->'domain_site', m.app_info::jsonb->'app_name'
     )
     
    from granules

But I get an error...
enter image description here

2
  • 1
    Please do not post your tables and results as images. Stack Overflow's markdown allows you to output nice tables that will help anybody wanting to help you, to quickly prototype what you're experimenting. You can even build a runnable fiddle, like this one I have started to prefill with your data (but I cannot extract anything from the of your screen capture): please fix it and complete it, click the "run" button, then post the resulting URL in your question. Commented Aug 31 at 16:14
  • 1
    Trying to be kind, I should warn you that I'm surprised that your question has not been downvoted yet, for such an abuse of images. For the tables, do at least copy-paste from your IDE to the question as a code block (having ugly data without headers but with a complete and valid app_info will be more helpful than screen captures), then make the effort to add |s to make a markdown table (there even exists online services to that, for example tabletomarkdown.com/convert-spreadsheet-to-markdown). Then drop your "Example value for field app_info"; and paste the error as text. Commented Aug 31 at 21:54

2 Answers 2

1
Why?

Both array_agg and sum are aggregate functions, thus the ERROR: aggregate function calls cannot be nested.
As each aggregate function should have its own group by (possibly an implicit one, as the group by for the array_agg is probably the granule of the outer query), PostgreSQL complains it can't infer which group by should go to which function. Your best bet is to make it clear, both for PostgreSQL and for you, by having subqueries dedicated to each level of grouping.

With a Common Table Expression, this would result in something like:

with
  grouped_by_granule_and_app as
  (
    select sum(…)
    …
    group by granula_id, app
  )
select …, array_agg(…)
from grouped_by_granule_and_app
group by granula_id;

(granula_id being your granula unique key: probably employee_id, date, granula_start)

How?

Now that we're done with theory, in practice, it's still hard to link your query to your (truncated) expected results, with other quirks that may prevent your query to work even after having separated aggregate functions:

  • EXTRACT(MINUTE FROM granula_end) - EXTRACT(MINUTE FROM start_time) will return a negative duration for the 18:55 - 19:00 granule
  • pps.my_temp is not correlated at all to from granules, notably by their common employee_id, so you'll sum up rows of my_temp unrelated to the granules.employee_id
  • where start_time >= granula_start and start_time <= granula_end is probably not what you want as "programs used during the granule of time".
    It would mean "programs started during the granule of time" (by correcting the second <= to <, else a start_time of exactly xx:x5:00.000 would be accounted once in its preceding granule, once in its following granule).
    If you want the intersection of a granule with the program running span, you'll want an overlap formula: where start_time < granula_end and end_time > granula_start
  • the separation of date and time makes your model risky for cross-midnight user sessions

So, here is a completely reworked query, that maybe answers to your need (at least it returns a JSON structure resembling the expected results you put in your answer).
Note that extensive use of CTEs helps in having a somewhat clear iteratively aggregating process:

with
  -- Intersections between a log (app + time span + url) and a (user-centric) granule:
  inter as
  (
    select
      m.app_info->'domain_site' domain_site, m.app_info->'app_name' app_name, m.app_info->'path' path, m.app_info->'app_type' app_type,
      g.employee_id, g.date, g.granula_start,
      m.app_info->'url' url,
      m.app_info->'title' title,
      extract(epoch from least(granula_end, end_time) - greatest(granula_start, start_time)) count_seconds
   from granules g join pps.my_temp m
     on g.employee_id = m.employee_id -- Do not mix granules for this employee with other employees!
    and g.date = m.date and start_time < granula_end and end_time > granula_start -- Formula for overlaps of two periods. /!\ still won't work for periods spanning over midnight
  ),
  -- By app and granule:
  app as
  (
    select
      domain_site, app_name, path, app_type,
      employee_id, date, granula_start,
      sum(count_seconds) count_seconds, -- ← Here…
      json_agg                          -- ← … and here we have aggregate functions, but _not nested_: they are just two different views of the same group by.
      (
        (select r from (select count_seconds, url, title) r) -- Trick to avoid repeating column names as fields, see https://stackoverflow.com/a/24970711/1346819
        order by count_seconds desc -- Start with the most used URLs
      ) titles
    from inter
    group by 1, 2, 3, 4, 5, 6, 7
  ),
/*
select *
from app
order by employee_id, date, granula_start, count_seconds desc;
*/
  -- Need another level of grouping? No problem, let's do it by granule only:
  gra as
  (
    select
      employee_id, date, granula_start,
      json_agg
      (
        (select r from (select app_name, path, domain_site, count_seconds, titles) r)
        order by count_seconds desc -- Start with the most used apps
      ) apps
    from app
    group by 1, 2, 3
  )
select *
from gra
order by employee_id, date, granula_start;

which returns something like this for the first granule (full running example available at this db<>fiddle):

[
   {
      "app_name":null,
      "path":"C:\\Windows\\explorer.exe",
      "domain_site":null,
      "count_seconds":18.010000,
      "titles":[
         {
            "count_seconds":9.011000,
            "url":"https://stackoverflow.com/questions/40978290/construct-json-object-from-query-with-group-by-sum",
            "title":null
         },
         {
            "count_seconds":6.005000,
            "url":"https://stackoverflow.com/questions/40978290/construct-json-object-from-query-with-group-by-sum",
            "title":null
         },
         {
            "count_seconds":2.994000,
            "url":"https://stackoverflow.com/questions/43117033/aggregate-function-calls-cannot-be-nested-postgresql",
            "title":null
         }
      ]
   },
   {
      "app_name":null,
      "path":"C:\\Windows\\System32\\mm…",
      "domain_site":null,
      "count_seconds":15.041000,
      "titles":[
         {
            "count_seconds":4.848000,
            "url":"https://stackoverflow.com/questions/43117033/aggregate-function-calls-cannot-be-nested-postgresql",
            "title":null
         },
         …
      ]
    },
    …
]

(note that URLs appear multiple times: a user can open the same URL in two tabs, which is accentuated on my test dataset with artificially duplicated URLs)

Sign up to request clarification or add additional context in comments.

4 Comments

array_agg() and sum() are aggregates, json_build_object() isn't.
Whoops you're right, I read only the expected result and thought OP wanted something like json_agg() to generate his titles, so I picked the first json function I saw in his code and copy-pasted it here without thinking, but I realize the question is a mess and even by separating the two layers of aggregate I'm not sure it will work: even the case looks suspicious, not able to handle periods over plain hours (if start_time is 18:56 and granula_end is 19:00, we get a negative duration of -56 minutes!).
No worries. I'm not yet sure my array() swap is enough to make this work either, the question could definitely use a cleanup so I didn't read into it beyond the nested aggregate error. If you just straighten up that one thing in your post, I have no complaints. Going an extra mile to address OP's intended logic behind the case will get you my upvote.
I finally preferred retro-engineering the whole problem :-) At least I have working code, and I even think it makes sense from a functional point of view.
1

If you're absolutely sure you want a SQL array of json-type values (a json[] or _json), you can wrap the query in an array() constructor and it'll just collect all rows from that subquery.

select 
employee_id,
date,
granula_start,
granula_end,
array(---------here, outside
 select --------insted of `array_agg()` inside and around `sum()`
    json_build_object('seconds', SUM( case 
                                      when end_time > granula_end 
                                      then ((EXTRACT(MINUTE FROM granula_end) - EXTRACT(MINUTE FROM start_time))*60) 
                                      else (case  when EXTRACT(MINUTE FROM start_time) = EXTRACT(MINUTE FROM end_time) 
                                                  then 
                                                   EXTRACT(SECOND FROM end_time) - EXTRACT(SECOND FROM start_time)
                                                  else
                                                   (EXTRACT(MINUTE FROM end_time) - EXTRACT(MINUTE FROM start_time)) * 60 
                                           end )
                                      end ),
                      'domain_site', m.app_info::jsonb->'domain_site',
                      'app_name', m.app_info::jsonb->'app_name',
                      'app_type', m.app_info::jsonb->'app_type' ) as app_info
 from pps.my_temp m
 where start_time >= granula_start and start_time <= granula_end
 group by m.app_info::jsonb->'domain_site', m.app_info::jsonb->'app_name'
)
from granules

This avoids direct nesting of sum() inside array_agg() that threw the error. You could just as well work around it by nesting queries deeper, but this fix is nice and minimally invasive.

If you actually want a json array of objects, use a json_agg() instead, adding a layer of nesting.

Also, if you don't mind losing non-significant whitespace and reordered, deduplicated keys, use jsonb. It's lighter, faster, more flexible, better supported and maintained. Plain json is just pre-validated text.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.