PostgreSQL. Full-text search performance

Question

all! I have a table created in PostgreSQL 9.3 that contains about 24 millions records with the following structure:

CREATE TABLE jtest
(
  recordid uuid NOT NULL,
  details json,
  tsv tsvector,
  CONSTRAINT jtest_pkey PRIMARY KEY (recordid)
)
WITH (
  OIDS=FALSE
);

Column tsv filled by trigger on the grounds of json contents: to_tsvector('english', json_extract_path_text(details, 'info').

Field info in json structure have similar content. For example, "Milk from manufacturer one", "Milk from another two". There are about 1 million records that contains "milk".

If i use the following query:

select recordid from jtest where tsv @@ to_tsquery('english', 'milk');

and dont use any index on tsv column, that query takes about 250 seconds. If i create gin index on tsv field then that query takes about 200 seconds.

Is there a possibility to increase performance?

Why is someone searching for milk if it's in every record?

Neil McGuigan
– Neil McGuigan

2014-07-29 18:37:05 +00:00
Commented Jul 29, 2014 at 18:37 — Neil McGuigan
– Neil McGuigan, Commented Jul 29, 2014 at 18:37

Denis de Bernardy · Accepted Answer · 2013-12-17 15:22:11Z

3

The answer lies in the question:

There are about 1 million records that contains "milk".

Index or not, you still need to retrieve that million rows. Also note that if a million means most rows, Postgres will ignore the index entirely and seq scan the entire table.

The index will help if you change the query to:

select recordid from jtest where tsv @@ to_tsquery('english', 'rare string');

answered Dec 17, 2013 at 15:22

Denis de Bernardy

79.1k14 gold badges138 silver badges158 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

PostgreSQL. Full-text search performance

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related