incremental update of a postgres index

Question

I have inserted a lot of data (more than 2 millions documents) in a table and created an a full text search index using GIN and it works great. I can query the database and retrieve the apropriate documents rapidly.

Regularly, I collect new data that I can insert in the database. What I would like to do is to update my index with the new data only, but I have failed so far. I don't want to drop the index and recreate it because it takes ages to recreate it. I basically would like to do an incremental update of the index. I can do that on the fly when data is being inserted but this is very very slow. I read that creating an index on inserted data was faster (true) so I guessed that updating an index on the new data could be done. But I can't do it so far.

I use postgresql 12.

Can anybody help me, please?

Laurenz Albe · Accepted Answer · 2020-05-14 14:40:35Z

3

There is no way to suspend adding values to the index while you load data.

But GIN indexes already have a feature to optimize that: the GIN fast update technique. If you set the gin_pending_list_limit storage parameter to the index to a high value. Once you are done with the bulk load, VACUUM the table to integrate the pending list into the main index.

An alternative approach is to use partitioning and load a partition at once. Then create the index on the partition and attach it to the partitioned table.

answered May 14, 2020 at 14:40

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Pierre Over a year ago

Thanks a lot, I am going to try that.

Pierre Over a year ago

Thanks a lot, I am going to try that. just to make sure: I populate my database using a python script and sqlalchemy. then, I alter my table to create a search_vector of type tsvector and create an index . by running UPDATE post SET search_vector = (to_tsvector(title) || to_tsvector(content)); I generate an index which I can use to query my database. Question: when you say vales get added to the index when you load data, does it mean that next time I run my python script to load new data, the index will automatically get updated?

Laurenz Albe Over a year ago

Yes, no matter how you modify the table, PostgreSQL will always modify the index along with it, keeping everything consistent. But for performance reasons, it keeps these modificatoins in a "penting list" in a GIN index, which is kind of an extra overflow area. Like the stack in the library of books that have been acquired recently and not yet put into the right place. When somebody comes looking for a book, you look in the proper catalog, but you also look throug the stack of new arrivals.

Pierre Over a year ago

But this pending list is eventually included in the index, isn't? Searches must scan the list of pending entries in addition to searching the regular index only if a query is submitted during the process. ,A query that is submitted once all the data have been imported will only have a look at the regular index. Is that correct?

Laurenz Albe Over a year ago

As soon as the pending list exceeds gin_pending_list_limit (or a VACUUM is run), the pending list is cleared and the actual index is modified (the library books are put on the shelves). Any query will always consult the pending list (it is part of the index), but if it is empty, there is no overhead.

|

Collectives™ on Stack Overflow

incremental update of a postgres index

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related