1

I am writing an sql program that creates a table with columns computed in a function. The function returns the record type. This is what the header looks like:

create or replace function get_items(col1 int, col2 int) returns record

What I would like to be able to do is:

create table table_items as (
with q as (
    select *,
   (SELECT * FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer))
    from 
    table_t as t
    )
    select * from q
);

however, that results in:

ERROR:  subquery must return only one column

To fix the error I changed the code into:

create table table_items as (
with q as (
    select *,
   (SELECT item1 FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer)),
   (SELECT item2 FROM get_items(t.col1, t.col2) AS (item1 integer, item2 integer)) 
    from 
    table_t as t
    )
    select * from q
);

This solution works, though twice as slow when item2 is obtained in addition to item1. I assume that it is because the same query is executed twice. Is there a way to obtain both items having the function invoked only once? Thanks a lot!

3 Answers 3

2

This is a good use case for a lateral join.

create table table_items as 
select *
from table_t as t
cross join lateral
get_items(t.col1, t.col2) as l(item1 integer, item2 integer);
Sign up to request clarification or add additional context in comments.

1 Comment

This seems to be what I needed. Thanks a lot!
1

My guess is that the reason is that returns record is polymorhic. Try with RETURNS TABLE (col1 int, col2 int) or return custom type

Comments

1

A direct implementation of what you planned initially should be faster than the lateral join. As suggested by @esmin: if you're only getting (item1 integer, item2 integer) type of record from the function, it's better to define it up front (here's an online demo for everything below):

drop function if exists get_items;

create type get_items_return_rec as (a int, b int);

create or replace function get_items(col1 int,col2 int) 
  returns get_items_return_rec language plpgsql as $$
begin
    return (col1,col2)::get_items_return_rec;
end $$;

Which will allow you to refer to its fields directly, exactly as you wanted to - calling it just once, splitting into fields later:

explain analyze 
select a,
       b,
       (rec).a,
       (rec).b 
from (
  select a,
         b,
         get_items(a,b) as rec
  from test) subquery;
--                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------
-- Seq Scan on test  (cost=0.00..360098.00 rows=700000 width=16) (actual time=2.244..816.545 rows=700000 loops=1)
-- Planning Time: 0.061 ms
-- JIT:
--   Functions: 2
--   Options: Inlining false, Optimization false, Expressions true, Deforming true
--   Timing: Generation 0.451 ms, Inlining 0.000 ms, Optimization 0.193 ms, Emission 1.807 ms, Total 2.451 ms
-- Execution Time: 841.719 ms

Compare that to the lateral join and your initial solution on the same, 700k row sample:

explain analyze
  select a,
         b,
         (select c from get_items(a,b) as rec(c int,d int)),
         (select d from get_items(a,b) as rec(c int,d int))
  from test;
--                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
-- Seq Scan on test  (cost=0.00..374098.00 rows=700000 width=16) (actual time=6.845..2275.461 rows=700000 loops=1)
--   SubPlan 1
--     ->  Function Scan on get_items rec  (cost=0.25..0.26 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=700000)
--   SubPlan 2
--     ->  Function Scan on get_items rec_1  (cost=0.25..0.26 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=700000)
-- Planning Time: 0.073 ms
-- JIT:
--   Functions: 18
--   Options: Inlining false, Optimization false, Expressions true, Deforming true
--   Timing: Generation 1.311 ms, Inlining 0.000 ms, Optimization 0.319 ms, Emission 6.256 ms, Total 7.886 ms
-- Execution Time: 2300.981 ms
explain analyze
  select a,
         b
  from  test
  cross join lateral
  get_items(a,b) as rec(c int,d int);

--                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
-- Nested Loop  (cost=0.25..24098.25 rows=700000 width=8) (actual time=0.064..1116.634 rows=700000 loops=1)
--   ->  Seq Scan on test  (cost=0.00..10098.00 rows=700000 width=8) (actual time=0.014..44.813 rows=700000 loops=1)
--   ->  Function Scan on get_items rec  (cost=0.25..0.26 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=700000)
-- Planning Time: 0.080 ms
-- Execution Time: 1140.850 ms

If it's speed you're after, see if you can rewrite your function as a plain SQL function instead of using plpgsql. The example is trivial, but it still illustrates the comparative difference you can expect:

drop function if exists get_items;
create or replace function get_items(col1 int,col2 int) 
  returns get_items_return_rec language sql stable as $$
select (col1,col2)::get_items_return_rec
$$;

explain analyze
select a,
       b,
       (rec).a,
       (rec).b
from (
  select a,
         b,
         get_items(a,b) as rec
  from test) subq;
--                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------
-- Seq Scan on test  (cost=0.00..10098.00 rows=700000 width=16) (actual time=0.010..64.552 rows=700000 loops=1)
-- Planning Time: 0.085 ms
-- Execution Time: 88.892 ms

Also, check out function volatility levels and if your function isn't causing side-effects, make it at least stable.

3 Comments

thanks a lot for your answer. I compared the two solutions, it turned out that the solution using cross join later was about 4 times faster on a slice of dataset of 1000 rows.
@Marina These aren't mutually exclusive, so you can combine them. Have you looked into switching the function to plain sql?
I have combined them following the example you provided (thanks!). The program execution time is comparable to that which uses record type. I tried changing plpgsql to sql but it throws a syntax error where a variable is declared. It will take me some time to understand how to do. Thank you again for the help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.