4

I have a really large array that have I computed with Apache Madlib and I would like to apply an operation to each single array in that 2d array.

I have found code that can help me unnest it from this related answer. However, the code is miserably slow on this really large 2d array (150,000+ 1d float arrays). While unnest() only takes a few seconds to run, even after waiting for several minutes the code has not completed.

Surely, there must be a faster way to unnest the large 2d array into smaller 1d arrays? Bonus point if that solution uses Apache Madlib. I did find one lead buried in the documentation called deconstruct_2d_array, however, when I try to call that function on the matrix, it fails with the following error:

ERROR: Function "deconstruct_2d_array(double precision[])": Invalid type conversion. Internal composite type has more elements than backend composite type.

0

2 Answers 2

7

The function you found in my old answer does not scale well for big arrays. I never thought of arrays your size, which should probably be a set (a table) instead.

Be that as it may, this PL/pgSQL function can replace the one in the referenced answer. Requires Postgres 9.1 or later.

CREATE OR REPLACE FUNCTION unnest_2d_1d(ANYARRAY, OUT a ANYARRAY)
  RETURNS SETOF ANYARRAY
  LANGUAGE plpgsql IMMUTABLE STRICT AS
$func$
BEGIN
   FOREACH a SLICE 1 IN ARRAY $1 LOOP
      RETURN NEXT;
   END LOOP;
END
$func$;

40x faster in my test on a big 2d-array in Postgres 9.6.

STRICT to avoid an exception for NULL input (as commented by IamIC):

ERROR: FOREACH expression must not be null

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, @Erwin, very useful! It just needs to be marked as STRICT.
@IamIC: Thanks, I added that.
1

There is now a built-in MADlib function to do this - array_unnest_2d_to_1d, which was introduced in the 1.11 release: http://madlib.incubator.apache.org/docs/latest/array__ops_8sql__in.html#af057b589f2a2cb1095caa99feaeb3d70

Here is an example usage:

CREATE TABLE test1 (pid int, points double precision[]);
INSERT INTO test1 VALUES
(100,  '{{1.0, 2.0, 3.0}, {4.0, 5.0, 6.0}, {7.0, 8.0, 9.0}}'),
(101,  '{{11.0, 12.0, 13.0}, {14.0, 15.0, 16.0}, {17.0, 18.0, 19.0}}'),
(102,  '{{21.0, 22.0, 23.0}, {24.0, 25.0, 26.0}, {27.0, 28.0, 29.0}}');
SELECT * FROM test1;

produces

 pid |               points               
-----+------------------------------------
 100 | {{1,2,3},{4,5,6},{7,8,9}}
 101 | {{11,12,13},{14,15,16},{17,18,19}}
 102 | {{21,22,23},{24,25,26},{27,28,29}}
(3 rows)

Then call the unnest function:

SELECT pid, (madlib.array_unnest_2d_to_1d(points)).* 
FROM test1 ORDER BY pid, unnest_row_id;

produces

pid | unnest_row_id | unnest_result 
-----+---------------+---------------
 100 |             1 | {1,2,3}
 100 |             2 | {4,5,6}
 100 |             3 | {7,8,9}
 101 |             1 | {11,12,13}
 101 |             2 | {14,15,16}
 101 |             3 | {17,18,19}
 102 |             1 | {21,22,23}
 102 |             2 | {24,25,26}
 102 |             3 | {27,28,29}
(9 rows)

where unnest_row_id is an index into the 2D array

1 Comment

Hi, there are a easy way to install madlib in UBUNTU?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.