POSTGRESQL: How to optimize index for substring of a column?

Question

How to optimize index for substring of a column ?

For example, having a column postal_code storing a string of 5 characters. If most of my queries filter on the 2 first characters having an index on this column is not useful.

What if I create an index only on the substring: CREATE INDEX ON index.annonces_parsed (left(postal_code, 2))

Is it a good solution, or is it better to add a new column storing only the substring and having an index on it ?

A query using this index could be:

select *
from index.cities
where left(postal_code, 2) = '83' --- Will it use the index on the substring ?

Thanks so much

where left(postal_code, 2) = '83' --> where postal_code like '83%'. Then just create a normal index on the column: create index ix1 on cities (postal_code);. — The Impaler
– The Impaler, Commented Sep 22, 2022 at 17:26
I would think a normal btree index using a like would be more efficient than a function based index (what @TheImpaler said). Text begins with searches are bread and butter for indexes. — Hambone
– Hambone, Commented Sep 22, 2022 at 20:55
As @TheImpaler mentioned, if you look for the first part of the field then a regular btree index on the field will work just fine. If you need to go searching "randomly" inside the field (e.g. WHERE field LIKE '%87%') you probably should have a look at adding a pg_trgm index on the field, it comes with many options I honestly never tried out but it worked wonders for the LIKE query as above on a rather large table. — deroby
– deroby, Commented Sep 23, 2022 at 13:19

Ramin Faracov · Accepted Answer · 2022-10-08 14:03:45Z

I have test table which is has a 20 million records.

Test 1

CREATE INDEX test_a1_idx ON test (a1)

explain analyze 
select * from test 
where left(a1, 2) = '58'

Gather  (cost=1000.00..103565.05 rows=40000 width=12) (actual time=0.429..468.428 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..98565.05 rows=16667 width=12) (actual time=0.114..407.330 rows=29904 loops=3)
        Filter: ("left"(a1, 2) = '58'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.424 ms
Execution Time: 470.472 ms


explain analyze 
select * from test 
where a1 like '58%'

Gather  (cost=1000.00..99284.01 rows=80523 width=12) (actual time=0.990..337.339 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..90231.71 rows=33551 width=12) (actual time=0.233..278.740 rows=29904 loops=3)
        Filter: (a1 ~~ '58%'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.092 ms
Execution Time: 339.259 ms

Test 2

CREATE INDEX test_a1_idx1 ON test (left(a1, 2))

explain analyze 
select * from test 
where left(a1, 2) = '58'

Bitmap Heap Scan on test  (cost=446.43..49455.46 rows=40000 width=12) (actual time=10.507..206.800 rows=89712 loops=1)
  Recheck Cond: ("left"(a1, 2) = '58'::text)
  Heap Blocks: exact=38298
  ->  Bitmap Index Scan on test_a1_idx1  (cost=0.00..436.43 rows=40000 width=0) (actual time=5.450..5.450 rows=89712 loops=1)
        Index Cond: ("left"(a1, 2) = '58'::text)
Planning Time: 0.501 ms
Execution Time: 209.217 ms

explain analyze 
select * from test 
where a1 like '58%'

Gather  (cost=1000.00..99284.01 rows=80523 width=12) (actual time=0.341..334.759 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..90231.71 rows=33551 width=12) (actual time=0.110..287.313 rows=29904 loops=3)
        Filter: (a1 ~~ '58%'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.067 ms
Execution Time: 336.762 ms

Result:

It should be noted that DB does not use indexes when we use any function in conditions. For this reason, functional indexing provides very good performance for these cases.

O'Rooney · Accepted Answer · 2023-09-06 02:17:45Z

0

Looks like a GIN index using "trigrams" will help you.

https://pganalyze.com/blog/gin-index

CREATE INDEX trgm_idx ON index.annonces_parsed USING gin (t gin_trgm_ops);

answered Sep 6, 2023 at 2:17

O'Rooney

3,1272 gold badges30 silver badges46 bronze badges

Collectives™ on Stack Overflow

POSTGRESQL: How to optimize index for substring of a column?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related