1

I need to use Full Text Search with Postgresql but I don't find the way to look for a list of words from a table (using ts_query) against an indexed text field (ts_vector data type). Is ts_query just able to process a few words or can process also multiple values that come from a table?

Thanks in advance for your help.

3
  • A to_tsquery can search for multiple tokens by splitting the tokens with a &: to_tsquery('foo & bar & baz'). So you could retrieve your list of words from your table and feed them (as tokens) to the `to_tsquery' function, separated by ampersands. Yet ... a few code examples could help to better illustrate what you are trying to accomplish. Commented Oct 26, 2015 at 2:01
  • I'm trying to do something like this: SELECT * FROM table , to_tsquery(SELECT words FROM another_table) query WHERE Table.indexed_text_field @@ query; I think your solution would be great, but I don't know how to feed to_tsquery. Commented Oct 26, 2015 at 6:57
  • Formulated an answer based on your comment. Commented Oct 27, 2015 at 0:22

1 Answer 1

7

Let me try to formulate an answer according to the comments given on the question (if I understand your request correctly).

Problem

You are trying to do a full text search on the table tableA, column indexed_text_field (a tsvector type) based on words that are stored as text in another table tableB in a column called words.

Solution

First, if you wish to feed PostgreSQL multiple tokens (individual words) during a full text search you have two functions at your disposal:

  • to_tsquery()
  • plainto_tsquery()

In the first function you need to split each given token with an ampersand (&). The second function can be fed any string of text and it will chop it into tokens for you. More info here.

Your challenge is that you wish to select matches based on words present in another table. This can be done in different ways, for example via a simple (INNER) JOIN:

SELECT a.* FROM tableA a, tableB b WHERE a.indexed_text_field @@ to_tsquery(b.words);

Or if you have multiple words in the words column you should most likely be using the plainto_tsquery() function to keep things simple:

SELECT a.* FROM tableA a, tableB b WHERE a.indexed_text_field @@ plainto_tsquery(b.words);

Yet, if you must use the more low-level to_tsquery() version:

SELECT a.* FROM tableA a, tableB b WHERE a.indexed_text_field @@ to_tsquery(replace(b.words, ' ', '&'));

In the latter you replace all spaces between the words with an ampersand, thus making them separate tokens. Mind the index usage on the last one though, as you might need to create an expression index on the usage of the replace() function.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you very much for the clear answer, it has been really useful for me. Now I'm facing some performance problems while running these codes, while indexed_text_fieldis a 23 Million record table and words is a 140 record table, it's taking more than 5 hours to finish. Do you think it's correct?
You are very welcome. If the answer helped you, please accept it as so. Regarding the performance, 5 hours is awfully long for this run, without a proper look at your database setup my first guess would be that your tsvector data misses a database index (Gist or Gin). What kind of indexes do you have on your data?
I'm using GIST for both fields.
Well, the next step is to see what the planner is doing then ... what is the output using EXPLAIN ANALYZE on the same query that took 5 hours (Btw, which query of the above did you run)?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.