0

How to optimize index for substring of a column ?

For example, having a column postal_code storing a string of 5 characters. If most of my queries filter on the 2 first characters having an index on this column is not useful.

What if I create an index only on the substring: CREATE INDEX ON index.annonces_parsed (left(postal_code, 2))

Is it a good solution, or is it better to add a new column storing only the substring and having an index on it ?

A query using this index could be:

select *
from index.cities
where left(postal_code, 2) = '83' --- Will it use the index on the substring ?

Thanks so much

5
  • 2
    run explain and check Commented Sep 22, 2022 at 17:11
  • 3
    where left(postal_code, 2) = '83' --> where postal_code like '83%'. Then just create a normal index on the column: create index ix1 on cities (postal_code);. Commented Sep 22, 2022 at 17:26
  • The index has to match the query, so please show the query. Commented Sep 22, 2022 at 19:38
  • 1
    I would think a normal btree index using a like would be more efficient than a function based index (what @TheImpaler said). Text begins with searches are bread and butter for indexes. Commented Sep 22, 2022 at 20:55
  • 1
    As @TheImpaler mentioned, if you look for the first part of the field then a regular btree index on the field will work just fine. If you need to go searching "randomly" inside the field (e.g. WHERE field LIKE '%87%') you probably should have a look at adding a pg_trgm index on the field, it comes with many options I honestly never tried out but it worked wonders for the LIKE query as above on a rather large table. Commented Sep 23, 2022 at 13:19

2 Answers 2

1

I have test table which is has a 20 million records.

Test 1

CREATE INDEX test_a1_idx ON test (a1)

explain analyze 
select * from test 
where left(a1, 2) = '58'

Gather  (cost=1000.00..103565.05 rows=40000 width=12) (actual time=0.429..468.428 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..98565.05 rows=16667 width=12) (actual time=0.114..407.330 rows=29904 loops=3)
        Filter: ("left"(a1, 2) = '58'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.424 ms
Execution Time: 470.472 ms


explain analyze 
select * from test 
where a1 like '58%'

Gather  (cost=1000.00..99284.01 rows=80523 width=12) (actual time=0.990..337.339 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..90231.71 rows=33551 width=12) (actual time=0.233..278.740 rows=29904 loops=3)
        Filter: (a1 ~~ '58%'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.092 ms
Execution Time: 339.259 ms

Test 2

CREATE INDEX test_a1_idx1 ON test (left(a1, 2))

explain analyze 
select * from test 
where left(a1, 2) = '58'

Bitmap Heap Scan on test  (cost=446.43..49455.46 rows=40000 width=12) (actual time=10.507..206.800 rows=89712 loops=1)
  Recheck Cond: ("left"(a1, 2) = '58'::text)
  Heap Blocks: exact=38298
  ->  Bitmap Index Scan on test_a1_idx1  (cost=0.00..436.43 rows=40000 width=0) (actual time=5.450..5.450 rows=89712 loops=1)
        Index Cond: ("left"(a1, 2) = '58'::text)
Planning Time: 0.501 ms
Execution Time: 209.217 ms

explain analyze 
select * from test 
where a1 like '58%'

Gather  (cost=1000.00..99284.01 rows=80523 width=12) (actual time=0.341..334.759 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..90231.71 rows=33551 width=12) (actual time=0.110..287.313 rows=29904 loops=3)
        Filter: (a1 ~~ '58%'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.067 ms
Execution Time: 336.762 ms

Result:

It should be noted that DB does not use indexes when we use any function in conditions. For this reason, functional indexing provides very good performance for these cases.

Sign up to request clarification or add additional context in comments.

Comments

0

Looks like a GIN index using "trigrams" will help you.

https://pganalyze.com/blog/gin-index

CREATE INDEX trgm_idx ON index.annonces_parsed USING gin (t gin_trgm_ops);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.