25

I need to create a PostgreSQL Full Text Search index in Python with SQLAlchemy. Here's what I want in SQL:

CREATE TABLE person ( id INTEGER PRIMARY KEY, name TEXT );
CREATE INDEX person_idx ON person USING GIN (to_tsvector('simple', name));

Now how do I do the second part with SQLAlchemy when using the ORM:

class Person(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String)

5 Answers 5

36

You could create index using Index in __table_args__. Also I use a function to create ts_vector to make it more tidy and reusable if more than one field is required. Something like below:

from sqlalchemy.dialects import postgresql
from sqlalchemy.sql import func

def create_tsvector(*args):
    exp = args[0]
    for e in args[1:]:
        exp += ' ' + e
    return func.to_tsvector('english', exp)

class Person(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String)

    __ts_vector__ = create_tsvector(
        cast(func.coalesce(name, ''), postgresql.TEXT)
    )

    __table_args__ = (
        Index(
            'idx_person_fts',
            __ts_vector__,
            postgresql_using='gin'
        )
    )

Update: A sample query using index (corrected based on comments):

people = Person.query.filter(Person.__ts_vector__.match(expressions, postgresql_regconfig='english')).all()
Sign up to request clarification or add additional context in comments.

3 Comments

Could you give an example on how to query the vector? Thanks.
@sharez with latest version, query.all() throws error: Neither 'BinaryExpression' object nor 'Comparator' object has an attribute 'all'. could you provide alternative to it ?
@apaleja match is an operator, hence it should be within a filter method, as such: Person.query.filter(Person.__ts_vector__.match(expressions, postgresql_regconfig='english')).all()
23

The answer from @sharez is really useful (especially if you need both a tsvector column and index). If you only need the tsvector GIN index and not the extra column, then you can use one of the approaches below.

Read the very helpful and concise PostgreSQL documentation section on Full Text Search - Tables and Indexes for a straightforward explanation of full text search without an index, with an index, and with a column plus an index. Pay attention to the following paragraph from those docs when using an index only approach (where it says "above", you can refer to the index creation examples below if you choose not to read the docs even though they are better than my commentary).

Because the two-argument version of to_tsvector was used in the index above, only a query reference that uses the 2-argument version of to_tsvector with the same configuration name will use that index. That is, WHERE to_tsvector('english', body) @@ 'a & b' can use the index, but WHERE to_tsvector(body) @@ 'a & b' cannot. This ensures that an index will be used only with the same configuration used to create the index entries.

The "two-argument version" refers to the configuration argument, such as 'english', and the field(s) argument. Yes, you are required to pass the configuration argument when creating the index (see the docs for more explanation), so you will also need it in your queries.

First example below creates a tsvector GIN index on a single column, second example creates the index on multiple columns. Note that the comma following Index(...) in __table_args__ is not a style choice, the value of __table_args__ must be a tuple, dictionary, or None.

Single column:

from sqlalchemy import Column, Index, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

class Example(Base):
    __tablename__ = 'examples'
    
    id = Column(Integer, primary_key=True)
    textsearch = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            func.to_tsvector('english', textsearch),
            postgresql_using='gin'
            ),
        )

Multiple columns using text():

from sqlalchemy import Column, Index, Integer, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

def to_tsvector_ix(*columns):
    s = " || ' ' || ".join(columns)
    return func.to_tsvector('english', text(s))

class Example(Base):
    __tablename__ = 'examples'

    id = Column(Integer, primary_key=True)
    atext = Column(String)
    btext = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            to_tsvector_ix('atext', 'btext'),
            postgresql_using='gin'
            ),
        )

6 Comments

Could you give an example on how to query the vector? Thanks.
@apaleja - see example query at the bottom of this answer: stackoverflow.com/questions/13837111/…
Note that with this I got: sqlalchemy.exc.CompileError: No literal value renderer is available for literal value "'english'" with datatype REGCONFIG. I needed to change "english" to sqlalchemy.literal("english").
Thanks @DustinOprea for your comment! It helped me to fix the same error when upgrading from SQLAlchemy 1.4.x to 2.0.x!
@bjornaer - PostgreSQL decides whether to use the index depending on the query (index only works in lots of situations but sometimes column plus index is needed for performance). PostgreSQL docs explain full text search queries for no index, index, and column plus index at Full Text Search - Tables and Indexes and you can translate to SQLAlchemy from there. As an aside, someone needs to ask a good, specific question about PostgreSQL full text search queries with SQLAlchemy, so this index-specific question stays index-specific.
|
14

Thanks for this question and answers.

I'd like to add a bit more in case ppl using alembic to manage versions by using autogenerate which creating the index seems not be detected.

We might end up writing our own alter script which look like.

"""add fts idx

Revision ID: e3ce1ce23d7a
Revises: 079c4455d54d
Create Date: 

"""

# revision identifiers, used by Alembic.
revision = 'e3ce1ce23d7a'
down_revision = '079c4455d54d'

from alembic import op
import sqlalchemy as sa


def upgrade():
    op.create_index('idx_content_fts', 'table_name',
            [sa.text("to_tsvector('english', content)")],
            postgresql_using='gin')


def downgrade():
    op.drop_index('idx_content_fts')

Comments

11

It has been answered already by @sharez and @benvc. I needed to make it work with weights though. This is how I did it based on their answers :

from sqlalchemy import Column, func, Index, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.operators import op

CONFIG = 'english'

Base = declarative_base()

def create_tsvector(*args):
    field, weight = args[0]
    exp = func.setweight(func.to_tsvector(CONFIG, field), weight)
    for field, weight in args[1:]:
        exp = op(exp, '||', func.setweight(func.to_tsvector(CONFIG, field), weight))
    return exp

class Example(Base):
    __tablename__ = 'example'

    foo = Column(String)
    bar = Column(String)

    __ts_vector__ = create_tsvector(
        (foo, 'A'),
        (bar, 'B')
    )

    __table_args__ = (
        Index('my_index', __ts_vector__, postgresql_using='gin'),
    )

1 Comment

Can you give an example of how this would be queried? I am using query.filter(Example.__ts_vector__.op("@@")(func.websearch_to_tsquery(FTS_CONFIG, search_term)).all(). However, I always get an empty list.
3

Previous answers here were helpful for pointing in the right direction. Below, a distilled & simplified approach using ORM approach & TSVectorType helper from sqlalchemy-utils (that is quite basic and can be simply copy/pasted to avoid external dependencies if needed https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html):

Defining a TSVECTOR column (TSVectorType) in your ORM model (declarative) populated automatically from the source text field(s)

import sqlalchemy as sa
from sqlalchemy_utils.types.ts_vector import TSVectorType
# ^-- https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html


class MyModel(Base):
    __tablename__ = 'mymodel'
    id = sa.Column(sa.Integer, primary_key=True)
    content = sa.Column(sa.String, nullable=False)

    content_tsv = sa.Column(
        TSVectorType("content", regconfig="english"),
        sa.Computed("to_tsvector('english', \"content\")", persisted=True))
    #      ^-- equivalent for SQL:
    #   COLUMN content_tsv TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "content")) STORED;

    __table_args__ = (
        # Indexing the TSVector column
        sa.Index("idx_mymodel_content_tsv", content_tsv, postgresql_using="gin"), 
    )

For additional details on querying using ORM, see https://stackoverflow.com/a/73999486/11750716 (there is an important difference between SQLAlchemy 1.4 and SQLAlchemy 2.0).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.