14

I don't really understand the difference between to_tsvector and ::tsvector in Postgres. I read the documentation on to_tsvectorhere, but there doesn't seem to be any documentation for the other, ::tsvector - that's a bit problematic. It is mentioned here, but it says something about normalizing before querying, and normalization is done through to_tsvector.

I created this SQL Fiddle to demonstrate the two; here is the code if you don't want to navigate away:

DDL:

CREATE TABLE text (
  text_id serial PRIMARY KEY,
  source_text text NOT NULL,
  destination_text text NOT NULL
);

and the SQL:

-- Throw some stuff in there
INSERT INTO text (source_text, destination_text) VALUES
('Hello', 'Hello Result'),
('With Comma, Query', 'WithComma, Result');

-- Forced to use punctuation in the query to match what is in the vector
SELECT T.source_text, T.destination_text
FROM   text T
WHERE  LOWER(T.source_text)::tsvector @@ LOWER('Hello')::tsquery;

-- Vector free of punctuation, don't include it in the query
SELECT T.source_text, T.destination_text
FROM   text T
WHERE  to_tsvector(LOWER(T.source_text)) @@ LOWER('Comma')::tsquery;

SELECT ts_debug('english', 'Something without a comma');

SELECT ts_debug('english', 'Something, with a comma');

It seems to me that to_tsvector will take the text, strip it of punctuation and return the vector. On the other hand, ::tsvector seems to include the punctuation in the vector, resulting in a need to use the same punctuation in the query.

What are the practical differences between the two? Is one generally preferred over the other? In what situations is each preferred?

1 Answer 1

23

to_tsvector(text) reads the string and does some normalization (taking language settings into account) on the string.

::tsvector is a cast. It does not do any normalization (and does not care for language settings).

See: http://www.postgresql.org/docs/current/interactive/functions-textsearch.html

Some years ago a wrote an own to_tsvector() in python, since I was not happy with the way this was handled in postgres. This gave me more control.

To insert the data into the column I used the tsvector cast:

'UPDATE myapp_mymodel SET content=%s::tsvector where id=%s', [tsvector, self.id])
Sign up to request clarification or add additional context in comments.

1 Comment

Ha, ::tsvector is a cast... I get it now. That was not obvious to me in the documentation - I was using ::tsvector early on in my project because I saw it widely used in examples, namely here. Thank you for the guidance!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.