15

I have a table like this:

+-----+----------------+
| ID  |  array300      |
+-----+----------------+
| 100 | {110,25,53,..} |
| 101 | {56,75,59,...} |
| 102 | {65,93,82,...} |
| 103 | {75,70,80,...} |
+-----+----------------+

array300 column is an array of 300 elements. I need to have arrays of 100 elements with every element representing the average of 3 elements of array300. For this example the answer will be like:
array100
{62.66,...}
{63.33,...}
{80,...}
{78.33,...}

1
  • See my update to the answer. Commented Dec 12, 2012 at 21:41

4 Answers 4

11

Try something like this:

SELECT id, unnest(array300) as val, ntile(100) OVER (PARTITION BY id) as bucket_num
FROM your_table

This SELECT will give you 300 records per array300 with same id and assing them the bucket_num (1 for firs 3 elements, 2 for next 3, and so on).

Then use this select to get the avg of elements in the bucket:

SELECT id, avg(val) as avg_val
FROM (...previous select here...)
GROUP BY id, bucket_num

Next - just aggregate the avg_val into array:

SELECT id, array_agg(avg_val) as array100
FROM (...previous select here...)
GROUP BY id

Details: unnest , ntile , array_agg , OVER (PARTITION BY )

UPD: Try this function:

CREATE OR REPLACE FUNCTION public.array300_to_100 (
  p_array300 numeric []
)
RETURNS numeric [] AS
$body$
DECLARE
  dim_start int = array_length(p_array300, 1); --size of input array
  dim_end int = 100; -- size of output array
  dim_step int = dim_start / dim_end; --avg batch size
  tmp_sum NUMERIC; --sum of the batch
  result_array NUMERIC[100]; -- resulting array
BEGIN

  FOR i IN 1..dim_end LOOP --from 1 to 100.
    tmp_sum = 0;

    FOR j IN (1+(i-1)*dim_step)..i*dim_step LOOP --from 1 to 3, 4 to 6, ...
      tmp_sum = tmp_sum + p_array300[j];  
    END LOOP; 

    result_array[i] = tmp_sum / dim_step;
  END LOOP; 

  RETURN result_array;
END;
$body$
LANGUAGE 'plpgsql'
IMMUTABLE
RETURNS NULL ON NULL INPUT;

It takes one array300 and outputs one array100. To use it:

SELECT id, array300_to_100(array300)
FROM table1;

If you have any problems understanding it - just ask me.

Sign up to request clarification or add additional context in comments.

1 Comment

Nice application of ntile().
6
+75

Putting the pieces of Igor into another form:

 select id, array300, (
    select array_agg(z) from
    (
        select avg(x) from 
        (
            select x, ntile(array_length(array300,1)/3) over() from unnest(array300) x
        ) y 
        group by ntile
    ) z
) array100
from your_table

For a small example table like this

 id |       array300        
----+-----------------------
  1 | {110,25,53,110,25,53}
  2 | {56,75,59,110,25,53}
  3 | {65,93,82,110,25,53}
  4 | {75,70,80,110,25,53}

the result is:

 id |       array300        |                   array100                    
----+-----------------------+-----------------------------------------------
  1 | {110,25,53,110,25,53} | {(62.6666666666666667),(62.6666666666666667)}
  2 | {56,75,59,110,25,53}  | {(63.3333333333333333),(62.6666666666666667)}
  3 | {65,93,82,110,25,53}  | {(80.0000000000000000),(62.6666666666666667)}
  4 | {75,70,80,110,25,53}  | {(75.0000000000000000),(62.6666666666666667)}
(4 rows)

Edit My first version used a fixes ntile(2). This only worked for source arrays of size 6. I've fixed that by using array_length(array300,1)/3 instead.

Comments

1

I'm not able to answer your question completely, however I have found aggregation function for summing integer arrays. Perhaps someone (or you) can modify it to avg.

Source: http://archives.postgresql.org/pgsql-sql/2005-04/msg00402.php

CREATE OR REPLACE FUNCTION array_add(int[],int[]) RETURNS int[] AS '
  DECLARE
    x ALIAS FOR $1;
    y ALIAS FOR $2;
    a int;
    b int;
    i int;
    res int[];
  BEGIN
    res = x;

    a := array_lower (y, 1);
    b := array_upper (y, 1);

    IF a IS NOT NULL THEN
      FOR i IN a .. b LOOP
        res[i] := coalesce(res[i],0) + y[i];
      END LOOP;
    END IF;

    RETURN res;
  END;
'
LANGUAGE plpgsql STRICT IMMUTABLE;

--- then this aggregate lets me sum integer arrays...

CREATE AGGREGATE sum_integer_array (
    sfunc = array_add,
    basetype = INTEGER[],
    stype = INTEGER[],
    initcond = '{}'
);


Here's how my sample table looked  and my new array summing aggregate
and function:

#SELECT * FROM arraytest ;
 id | somearr
----+---------
 a  | {1,2,3}
 b  | {0,1,2}
(2 rows)

#SELECT sum_integer_array(somearr) FROM arraytest ;
 sum_integer_array
-------------------
 {1,3,5}
(1 row)

Comments

1

Is this any faster?

Edit: This is more elegant:

with  t as (select generate_series(1, 100,1) a , generate_series(101,200,1) b , generate_series(201,300,1) c)

    select 
        id,
        array_agg((array300[a] + array300[b] + array300[c]) / 3::numeric order by a)  as avg
    from 
        t,
        tmp.test2
    group by 
        id

End of edit

Edit2 This is the shortest select I can think of:

select 
    id,
    array_agg((array300[a] + array300[a+100] + array300[a+200]) / 3::numeric order by a)  as avg
from 
    (select generate_series(1, 100,1) a) t,
    tmp.test2
group by 
    id

End of edit2

with 

t as (select generate_series(1, 100,1) a , generate_series(101,200,1) b , generate_series(201,300,1) c)

,u as (
    select 
        id,
        a,
        (array300[a] + array300[b] + array300[c]) / 3::numeric as avg
    from 
        t,
        tmp.test2 /* table with arrays - id, array300 */
    order by 
        id,
        a
 )

select 
    id, 
    array_agg(avg)
from 
    u 
group by 
    id

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.