2

The official doc states that:

One advantage of the separate-column approach over an expression index is that ... Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches.

Why does a gin expression index to_tsvector('english', body) has to "verify index matches"? It seems that index are automatically updated after every update/insert, All indices have same update issue and this might not be the point to be concerned.

2 Answers 2

1

I think this deals with the "recheck" which is necessary, since the GIN index scan is potentially lossy: it will return values that contain all elements from the tsvector you search for. All these rows get rechecked to see if they really match the tsquery. That means that the to_tsvector function is evaluated for all rows that are returned by the index scan.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the quick reply. So for both the "separate-column approach" and "expression index approach", the recheck phase is unaviodable. The second approach needs an extra "redo the to_tsvector calls" to get the ts_vector and this is why the doc says "Another advantage is that searches will be faster"?
I guess so, and I don't know for certain whether there will always be a recheck (I didn't bother to look at the code).
1

As the docs say, that is more important for GIST than for GIN.

GIN indexes can still need to be rechecked if work_mem is too small to hold the entire bitmap so they go lossy. They will also need to be rechecked if the pattern uses relative position indicators like <->, <2> etc.

It might also need rechecking if you have many &ed together tokens and it just decides to recheck the more common of them rather than bothering with all the bitmaps for them (I'm not sure if actually does this here or not--I've never witnessed it for @@ but without having inspected the entire code I can't rule out the possibility) or maybe if you have complicated boolean tsquery expressions.

2 Comments

The new version doc says As inverted indexes, they contain an index entry for each word (lexeme), with a compressed list of matching locations. So position in tsquery can be properly handled by the GIN index? But complicated ts_query might still require a "recheck" on the matching row?
@MEDS No, 'locations' there only means the locations of the row within the table, not the location of the lexeme within the document. (That wording does seem to invite confusion.) There is a custom index type which does store the lexeme locations, and so could avoid the recheck. I've had mixed luck with it, it might be worth a try.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.