0

I have a table that has this schema:

create table mytable (creation_date timestamp,
                      value int,
                      category int);

I want the maximum ocurrence of a value every each hour per category, Only on week days. I had made some progress, I have a query like this now:

select category,foo.h as h,value, count(value) from mytable, (
      select date_trunc('hour', 
                        '2000-01-01 00:00:00'::timestamp+generate_series(0,23)*'1 hour'::interval)::time as h) AS foo 
      where date_part('hour',creation_date) = date_part('hour',foo.h) and
            date_part('dow',creation_date) > 0 and date_part('dow',creation_date) < 6
      group by category,h,value;

as result I got something like this:

category |    h     |  value  | count 
---------+----------+---------+-------
       1 | 00:00:00 |       2 |     1
       1 | 01:00:00 |       2 |     1
       1 | 02:00:00 |       2 |     6
       1 | 03:00:00 |       2 |    31
       1 | 03:00:00 |       3 |    11
       1 | 04:00:00 |       2 |    21
       1 | 04:00:00 |       3 |     9
       1 | 13:00:00 |       1 |    14
       1 | 14:00:00 |       1 |    10
       1 | 14:00:00 |       2 |     7
       1 | 15:00:00 |       1 |    52

for example at 04:00 I have to values 2 and 3, with counts of 21 and 9 respectively, I only need the value with highest count which would be the statiscal mode.

BTW I have more than 2M records

2
  • 1
    What version of postgres are you using and what would you want to happen if there are two different values which have the same count? eg: if 1,04:00:00,3,11 was 1,04:00:00,3,21 instead? Commented Feb 3, 2012 at 19:42
  • PostgreSQL 8.4, if two values have the same count I need the lowest value. Commented Feb 3, 2012 at 19:46

2 Answers 2

2

This can be simpler:

SELECT DISTINCT ON (category, extract(hour FROM creation_date)::int)
       category
     , extract(hour FROM creation_date)::int AS h
     , count(*)::int AS max_ct
     , value
FROM   mytable
WHERE  extract(isodow FROM creation_date) < 6 -- no sat or sun
GROUP  BY 1,2,4
ORDER  BY 1,2,3 DESC;

Basically these are the steps:

  1. Exclude weekends (WHERE ...). Use ISODOW to simplify the expression.
  2. Extract hour from timestamp as h.
  3. Group by category, h and value.
  4. Count the rows per combination of the three; cast to integer - we don't need bigint.
  5. Order by category, h and the highest count (DESC).
  6. Only pick the first row (highest count) per (category, h) with the according category.

I am able to do this in one query level, because DISTINCT is applied after the aggregate function.

The result will hold no rows for any (category, h) without no entries at all. If you need to fill in the blanks, LEFT JOIN to this:

SELECT c.category, h.h
FROM   cat_tbl c
CROSS  JOIN (SELECT generate_series(0, 23) AS h) h
Sign up to request clarification or add additional context in comments.

Comments

1

Given the size of your table, I'd be tempted to use your query to build a temporary table, then run a query on that to finalise the results.

Assuming you called the temporary table "summary_table", the following query should do it.

select
    category, h, value, count
from
    summary_table s1
where
    not exists 
    (select * from summary_table s2
        where s1.category = s2.category and
                  s1.h = s2.h and
                   (s1.count < s2.count 
            OR (s1.count = s2.count and s1.value > s2.value));

If you don't want to create a table, you could use a WITH clause to attach your query to this one.

with summary_table as (
select category,foo.h as h,value, count(value) as count from mytable, (
      select date_trunc('hour', 
                        '2000-01-01 00:00:00'::timestamp+generate_series(0,23)*'1 hour'::interval)::time as h) AS foo 
      where date_part('hour',creation_date) = date_part('hour',foo.h) and
            date_part('dow',creation_date) > 0 and date_part('dow',creation_date) < 6
      group by category,h,value)
select
    category, h, value, count
from
    summary_table s1
where
    not exists 
    (select * from summary_table s2
        where s1.category = s1.category and
                  s1.h = s2.h and
                   (s1.count < s2.count 
            OR (s1.count = s2.count and s1.value > s2.value));

1 Comment

Thanks, but the query didn't work, it worked if I put the second subquery in the from clause, like this: from summary_table s1 (select * from summary_table) as s2 where where s1.category = s1.category and s1.h = s2.h and (s1.count < s2.count OR (s1.count = s2.count and s1.value > s2.value));

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.