How to index on regex substring in PostgreSQL

Question

I have a table which looks like this:

id	price	barcode
1	300	A_100-15437859603-233
2	200	A_123-49875452222-128
3	180	A_231-21284568323-367
4	150	B_122457

Having two (or more) formats of data in the "barcode" column. Now my queries look like this:

SELECT * FROM my_table
WHERE barcode like 'A_%' AND
SUBSTRING(barcode, '-(.*?)-')='15437859603'

In order to find the first row for example. This table has tens of millions of rows, how can I speed up this regex search in PostgreSQL? Can I create an index on SUBSTRING(barcode, '-(.*?)-')?

Would it allready help just to expand the LIKE() operator. For example something like like 'A[_]%[-]15437859603[-]%'? — JvdV
– JvdV, Commented Sep 16, 2021 at 5:43
@JvdV Good idea, but this would also be a sequential scan right? How to avoid that? — Alireza
– Alireza, Commented Sep 16, 2021 at 5:46

Laurenz Albe · Accepted Answer · 2021-09-16 05:48:02Z

1

Yes, you can create an index on SUBSTRING(barcode, '-(.*?)-').

To support the first condition, you should change it to substr(barcode, 1, 2) = 'A_', then you can use the following index to support the query ideally:

CREATE INDEX ON my_table (
   SUBSTRING(barcode, '-(.*?)-'),
   substr(barcode, 1, 2)
);

If the first condition always compares with 'A_', you could also use

CREATE INDEX ON my_table (SUBSTRING(barcode, '-(.*?)-'))
   WHERE substr(barcode, 1, 2) = 'A_';

answered Sep 16, 2021 at 5:48

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to index on regex substring in PostgreSQL

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related