2

I am aggregation a table using file ID field. Each file has a name which matched exactly one (his) file id.

select file_key, min(fullfilepath)
from table
group by file_key 

Because I know the structure of the table, I know that I need ANY fullfilepath. The min and the max are ok, but it requires a lot of time.

I came across this aggregation function which returns the first value. Unfortunately, this function takes a long time, because it scans the whole table. For example, this is very slow:

select first(file_id) from table;

What is the fastest way to do that? With or without aggregation function.

2
  • For the first query, try select distinct on (file_key), file_key, fullfilepath from the_table order by file_key, fullfilepath - that might be faster then the group by Commented Feb 14, 2017 at 13:15
  • wiki.postgresql.org/wiki/First_(aggregate) Commented Jul 3, 2019 at 5:21

3 Answers 3

6

There is no way to make your first query with the GROUP BY clause faster, because it has to scan the whole table to find all groups.

Your second query can be made faster:

SELECT (
   SELECT file_id FROM "table"
   WHERE file_id IS NOT NULL
   LIMIT 1
);

There is no way to optimize the query as you wrote it, because the aggregate function is a black box to PostgreSQL.

Sign up to request clarification or add additional context in comments.

2 Comments

You last statement is usually true. But PostgreSQL can optimize (and use an index), when it has a defined SORTOP (which min/max has).
That means that you can use the index for SELECT min(field) FROM atable, but not for SELECT min(field) FROM atable GROUP BY anotherfield. Think about it - all different values of anotherfield have to be identified, and how can an index help there? That requires a sequential or index scan over the whole table, and the table scan is usually cheaper there.
2

I doubt that this will help performance but it may be useful if anyone actually wants a first agregate.

-- coaslesce isn't a function so make an equivalent function.
create function coalesce_("anyelement","anyelement") returns "anyelement"     
    language sql as $$ select coalesce( $1,$2 ) $$;

create aggregate first("anyelement") (sfunc=coalesce_, stype="anyelement");

Comments

-1
select 
    distinct on (file_key) 
    file_key, fullfilepath
from table
order by file_key 

That will return one record for each file_key

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.