I dont' think indexes will help. I am not sure why, but I am just being experimental...
First, we create a table with your structure, and fill it with random data (10.000 rows, and a vector of 1.000 columns).
CREATE TABLE t
(
id integer /* PRIMARY KEY */,
stuff integer,
large_array_of_values float[]
) ;
CREATE OR REPLACE FUNCTION random_vector() RETURNS float[] AS
$$
select
array_agg(random())
from
generate_series (1, 1000)
$$
LANGUAGE SQL ;
INSERT INTO t
(id, stuff, large_array_of_values)
SELECT
id, random()*10000, random_vector()
FROM
generate_series(1, 10000) AS i(id) ;
At this point we create one sample index for values at index [32] of the vector (plus the id!):
CREATE INDEX
idx_32 ON t(id, (large_array_of_values[32]));
Now, we ask PostgreSQL to analyze the following query and explain it:
EXPLAIN ANALYZE
SELECT
avg(large_array_of_values[32])
FROM
t
WHERE
id BETWEEN 5000 and 7500
AND (large_array_of_values[32]) > 0.32 ;
| QUERY PLAN |
| :---------------------------------------------------------------------------------------------------------------------- |
| Aggregate (cost=46.94..46.95 rows=1 width=8) (actual time=54.871..54.871 rows=1 loops=1) |
| -> Bitmap Heap Scan on t (cost=4.91..46.89 rows=17 width=32) (actual time=0.392..1.204 rows=1732 loops=1) |
| Recheck Cond: ((id >= 5000) AND (id <= 7500) AND (large_array_of_values[32] > '0.32'::double precision)) |
| Heap Blocks: exact=20 |
| -> Bitmap Index Scan on idx_32 (cost=0.00..4.91 rows=17 width=0) (actual time=0.364..0.364 rows=1732 loops=1) |
| Index Cond: ((id >= 5000) AND (id <= 7500) AND (large_array_of_values[32] > '0.32'::double precision)) |
| Planning time: 0.405 ms |
| Execution time: 55.013 ms |
dbfiddle here
The query plan is not using the index (idx_32) to perform an index-only scan, which I guess is what you wanted, even if idx_32 is a covering index for such a query.
VACUUMing, to make sure that the visibility map was up-to-date didn't have any effect. I couldn't find any explicit restrictions on index-only scans having to refer to columns (and not Indexes on Expressions), but it appears that they aren't used in this second case.
Comparison with a non-vector column
CREATE TABLE t
(
id integer /* PRIMARY KEY */,
stuff integer,
a_value float
) ;
INSERT INTO t
(id, stuff, a_value)
SELECT
id, random()*10000, random()
FROM
generate_series(1, 10000) AS i(id) ;
CREATE INDEX idx_value ON t(id, a_value);
VACUUM ANALYZE VERBOSE t ;
In this case, the covering index is actually used, and you get an "index-only scan".
EXPLAIN ANALYZE
SELECT
avg(a_value)
FROM
t
WHERE
id BETWEEN 5000 and 7500 AND (a_value > 0.1)
| QUERY PLAN |
| :---------------------------------------------------------------------------------------------------------------------------- |
| Aggregate (cost=103.67..103.69 rows=1 width=8) (actual time=1.139..1.140 rows=1 loops=1) |
| -> Index Only Scan using idx_value on t (cost=0.29..98.05 rows=2251 width=8) (actual time=0.026..0.655 rows=2247 loops=1) |
| Index Cond: ((id >= 5000) AND (id <= 7500) AND (a_value > '0.1'::double precision)) |
| Heap Fetches: 0 |
| Planning time: 0.184 ms |
| Execution time: 1.179 ms |
dbfiddle here