get data1 and data2 that have best (highest) fit, but are not null (where possible), grouped by f_key.
demo at db<>fiddle
select distinct f_key,
first_value(data1)over(w1 order by data1 is null, fit desc) as d1,
first_value(data2)over(w1 order by data2 is null, fit desc) as d2
from items
window w1 as(partition by f_key);
| f_key |
d1 |
d2 |
| 1 |
{a1,a2} |
{b3} |
| 2 |
{c1,c2} |
null |
This emulates ignore nulls clause, or filter(where dataX is not null) by sorting them behind everything else, then grabbing first_value(). Note that false is lower than true and default ascending ordering means dataX is null end up last.
Since that's a window function, not an aggregate, distinct has to deduplicate results.
with cte1 as(
select distinct on(f_key)f_key,data1 as top1
from items
where data1 is not null
order by f_key, fit desc)
,cte2 as(
select distinct on(f_key)f_key,data2 as top2
from items
where data2 is not null
order by f_key, fit desc)
select distinct f_key, top1, top2
from items
left join cte1 using(f_key)
left join cte2 using(f_key);
It grabs the whole row coinciding with each top fit value per f_key. It's possibly more efficient. To really discuss performance, we'd at least have to know the volume and characteristics of your data sets and what indexes you have in place.
How do I retrieve the first sub-array from the result of array_agg?
select f_key,
(array_agg(data1 order by fit desc)filter(where data1 is not null))[1][:] as d1,
(array_agg(data2 order by fit desc)filter(where data2 is not null))[1][:] as d2
from items
group by f_key;
| f_key |
d1 |
d2 |
| 1 |
{{a1,a2}} |
{{b3}} |
| 2 |
{{c1,c2}} |
null |
In Postgres, omitting subscripts of a multidimensional array gets you a null:
an array reference with the wrong number of subscripts yields a null rather than an error.
Note that if there's even one slice in use, all other subscripts become slices, meaning that [1][:] is the same as [1:1][:], but more importantly [2][:] becomes [1:2][:] not [2:2][:] as you might expect.
If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified.
Also, order by in a subquery might typically give you an ordered array but I'm not sure it guarantees it. Luckily, aggregate functions offer an internal order by.
It's worth underlining that a slice keeps the dimensionality of its source array. Since there's nothing else in there in your case, you can strip that one dimension with array(select unnest(arr)) - it will spit out all atomic elements, then re-collect them into a 1D array. Here's a whole thread just about that one topic of unwrapping Postgres arrays.
Custom aggregates
What you're trying to do is effectively a vertical coalesce()
select f_key,
coalesce_agg(data1 order by fit desc) d1,
coalesce_agg(data2 order by fit desc) d2
from items
group by f_key;
You can create aggregate that does exactly that:
create function coalesce_agg_sfunc(anycompatible,anycompatible)
returns anycompatible as 'select coalesce($1,$2)' language sql;
create aggregate coalesce_agg(anycompatible)( sfunc=coalesce_agg_sfunc
,stype=anycompatible);
If you were on PostrgeSQL version 16 or higher, this test seems to suggest built-in any_value() happens to act exactly the same when given an internal order by. Problem is, the maintainers are free to make it ignore that clause entirely in the future, which would make sense as an optimisation - same as for count(), sum(), avg() or any other commutative, order-insensitive aggregate.
The v14 you're on is currently the oldest supported version. Please plan an upgrade.
null: "an array reference with the wrong number of subscripts yields a null rather than an error.".