107

I am implementing a table that has a column with a data type of tsvector. What index would be better to use?

GIN or GiST?

In looking through the PostgreSQL documentation here, I seem to get that:

  • GiST is faster to update and build the index and less accurate than GIN.

  • GIN is slower to update and build the index, but it is more accurate.

OK, so why would anybody want a GiST indexed field over GIN? If GiST, could give you the wrong results? There must be some advantage (outside performance) on this.

In layman's terms, when would I want to use GIN vs. GiST?

1
  • 8
    Always provide your version of Postgres. GIN has received major improvements in Postgres 9.4 Commented Mar 10, 2015 at 23:46

1 Answer 1

166

I don't think I could explain it better than the manual already does:

In choosing which index type to use, GiST or GIN, consider these performance differences:

  • GIN index lookups are about three times faster than GiST

  • GIN indexes take about three times longer to build than GiST

  • GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled [...]

  • GIN indexes are two-to-three times larger than GiST indexes

The link and quote refer to the manual for PostgreSQL 9.4. Size and performance estimates seemed slightly outdated already. With PostgreSQL 9.4 the odds have shifted substantially in favor of GIN. The release notes of PostgreSQL 9.4 include:

  • Reduce GIN index size (Alexander Korotkov, Heikki Linnakangas) [...]

  • Improve speed of multi-key GIN lookups (Alexander Korotkov, Heikki Linnakangas)

Size and performance estimates have since been removed from the manual.

Note that there are special use cases that require one or the other.

One thing you misunderstood: You never get wrong results with a GiST index. The index operates on hash values, which can lead to false positives in the index. This should only become relevant with a very big number of different words in your documents. False positives are eliminated after re-checking the actual row in any case. The manual:

A GiST index is lossy, meaning that the index may produce false matches, and it is necessary to check the actual table row to eliminate such false matches. (PostgreSQL does this automatically when needed.)

The bold emphasis is mine.

Sign up to request clarification or add additional context in comments.

5 Comments

I believe you meant "You never get wrong results with a GIN index", right?
@IamIC: You never get wrong results with either GIN or GiST. But I am specifically addressing GiST in the answer because the OP had a wrong impression there.
Understood. That makes sense.
If there are more reads than records in your table, you need to use GIN.
I think the point GIN indexes are two-to-three times larger than GiST indexes is not valid as I have tested with few large tables and found GiST index taking up more space than GIN and BTree indexes

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.