Postgres to_tsvector VS ::tsvector

Question

I don't really understand the difference between to_tsvector and ::tsvector in Postgres. I read the documentation on to_tsvectorhere, but there doesn't seem to be any documentation for the other, ::tsvector - that's a bit problematic. It is mentioned here, but it says something about normalizing before querying, and normalization is done through to_tsvector.

I created this SQL Fiddle to demonstrate the two; here is the code if you don't want to navigate away:

DDL:

CREATE TABLE text (
  text_id serial PRIMARY KEY,
  source_text text NOT NULL,
  destination_text text NOT NULL
);

and the SQL:

-- Throw some stuff in there
INSERT INTO text (source_text, destination_text) VALUES
('Hello', 'Hello Result'),
('With Comma, Query', 'WithComma, Result');

-- Forced to use punctuation in the query to match what is in the vector
SELECT T.source_text, T.destination_text
FROM   text T
WHERE  LOWER(T.source_text)::tsvector @@ LOWER('Hello')::tsquery;

-- Vector free of punctuation, don't include it in the query
SELECT T.source_text, T.destination_text
FROM   text T
WHERE  to_tsvector(LOWER(T.source_text)) @@ LOWER('Comma')::tsquery;

SELECT ts_debug('english', 'Something without a comma');

SELECT ts_debug('english', 'Something, with a comma');

It seems to me that to_tsvector will take the text, strip it of punctuation and return the vector. On the other hand, ::tsvector seems to include the punctuation in the vector, resulting in a need to use the same punctuation in the query.

What are the practical differences between the two? Is one generally preferred over the other? In what situations is each preferred?

guettli · Accepted Answer · 2017-10-12 07:18:00Z

23

to_tsvector(text) reads the string and does some normalization (taking language settings into account) on the string.

::tsvector is a cast. It does not do any normalization (and does not care for language settings).

See: http://www.postgresql.org/docs/current/interactive/functions-textsearch.html

Some years ago a wrote an own to_tsvector() in python, since I was not happy with the way this was handled in postgres. This gave me more control.

To insert the data into the column I used the tsvector cast:

'UPDATE myapp_mymodel SET content=%s::tsvector where id=%s', [tsvector, self.id])

edited Oct 12, 2017 at 7:18

user330315

answered Jun 4, 2014 at 20:08

guettli

27.7k109 gold badges423 silver badges779 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Chris Cirefice Over a year ago

Ha, ::tsvector is a cast... I get it now. That was not obvious to me in the documentation - I was using ::tsvector early on in my project because I saw it widely used in examples, namely here. Thank you for the guidance!

Collectives™ on Stack Overflow

Postgres to_tsvector VS ::tsvector

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related