Optimal PostgreSQL single/multicolumn indexes?

Question

I'm trying to determine the best indexes for a table in PostgreSQL. I expect on the order of ~10b rows and ~10TB data.

The table has 5 main columns used for filtering and/or sorting

Filtering: 3 Columns of binary data stored as bytea
Filtering / sorting: 2 Columns of type integer

CREATE TABLE table (
  filter_key_1 AS BYTEA,    -- filtering
  filter_key_2 AS BYTEA,    -- filtering
  filter_key_3 AS BYTEA,    -- filtering
  sort_key_1   AS INTEGER,  -- filtering & sorting
  sort_key_2   AS INTEGER   -- filtering & sorting
)

Queries will be:

SELECT * FROM table WHERE filter_key_1 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_2 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_3 = $1 ORDER BY sort_key_1, sort_key_2 LIMIT 15;

SELECT * FROM table WHERE filter_key_1 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_2 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_3 = $1 AND sort_key_1 <= $2 AND sort_key_2 <= $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;

What are the ideal indexes for the table? How large will they get with ~10b rows? How much will they limit write throughput?

Edit

What if I want to add additional queries such as below. Would the indexes from above hold-up?

SELECT * FROM table WHERE filter_key_1 = $1 AND filter_key_2 = $2 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
SELECT * FROM table WHERE filter_key_1 = $1 AND filter_key_2 = $2 AND filter_key_3 = $3 ORDER BY sort_key_1, sort_key_2 LIMIT 15;
-- ...

IO requirements

The workload is heavy read, low write.

Read speed is important. Write speed is less important (can live with up-to 3 seconds per insert)

Read:
- expecting on average 150 read queries/sec
- most queries pulling in 100 to 100,000 rows after WHERE and before LIMIT
Write:
- expecting 1 write query/12sec, 0.08 queries/sec
- writing 500-1000 rows/query, 42-84 rows/sec

What is perfect here depends on a) how selective the WHERE conditions are, b) what the read/write ratio of the table is c) how often your queries run and how important speed is. The indexes can become larger than the table. — Laurenz Albe
– Laurenz Albe, Commented Sep 30, 2022 at 6:01
@LaurenzAlbe low write, heavy read. Read: Expecting on average 150 read queries / sec returning 15 rows / query -> 2,250 rows / second. Write: Expecting 1 write query per 12 seconds writing 500-1000 rows -> 0.08 queries / second, 42-84 rows / second. Read speed is important. Write speed is less important (can live with up-to 3 seconds per insert). — nick314
– nick314, Commented Sep 30, 2022 at 6:30

Laurenz Albe · Accepted Answer · 2022-09-30 07:59:25Z

2

Since you need to run these queries all the time, you will have to optimize them as much as possible. That would mean

CREATE INDEX ON tab (filter_key_1, sort_key_1, sort_key_2);
CREATE INDEX ON tab (filter_key_2, sort_key_1, sort_key_2);
CREATE INDEX ON tab (filter_key_3, sort_key_1, sort_key_2);

Together, these indexes should be substantially larger than your table.

edited Sep 30, 2022 at 7:59

answered Sep 30, 2022 at 6:36

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

deroby Over a year ago

Size might indeed become an issue, especially if we already have 10TB of just plain data!! I wonder if creating 3 HASH indexes on just each filter key might already be enough, reading and sorting 100 to 100.000 lines will bring some overhead but still might be acceptable? Then again, doing 150 of those per sec and having the 'occasional write' in the background too.. I wonder. Another clear example of 'no such thing as a free lunch' =)

Laurenz Albe Over a year ago

Yes, it is the 150 per second that will kill you. Otherwise less perfect indexes would do. Of course, there is always the option to change the requirements...

Collectives™ on Stack Overflow

Optimal PostgreSQL single/multicolumn indexes?

Edit

IO requirements

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Edit

IO requirements

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related