1

I have huge words table which I'm running LIKE query on:

create table words
(
  id   int,
  word varchar
)

It works pretty long. Index doesn't help a lot, so I'm trying to partition it by word column:

create table words
(
  id   int,
  word varchar
) partition by RANGE (word);

CREATE TABLE words_1 PARTITION OF words
  FOR VALUES FROM ('a') TO ('n');

CREATE TABLE words_2 PARTITION OF words
  FOR VALUES FROM ('n') TO ('z');

NOTE: Actually I'm planning to make 1 partition for each letter. Use only 2 of them for example simplicity.

So partitioning seems to work OK with equality and gt/lt operators:

explain
select * from words where word = 'abc'
Seq Scan on words_1 words  (cost=0.00..25.88 rows=6 width=36)
  Filter: ((word)::text = 'abc'::text)
explain
select * from words where word >= 'nth'
Seq Scan on words_2 words  (cost=0.00..25.88 rows=423 width=36)
  Filter: ((word)::text >= 'nth'::text)

But on LIKE queries it keeps scanning both partitions:

explain
select * from words where word LIKE 'abc%'
Append  (cost=0.00..51.81 rows=12 width=36)
  ->  Seq Scan on words_1  (cost=0.00..25.88 rows=6 width=36)
        Filter: ((word)::text ~~ 'abc'::text)
  ->  Seq Scan on words_2  (cost=0.00..25.88 rows=6 width=36)
        Filter: ((word)::text ~~ 'abc'::text)

Is there a way to make partitioning work on LIKE queries?

Maybe is there another way to achieve what i want?

6
  • LIKE 'abc' gives same result. Anyway, like without % doesn't make a lot of sense Commented Mar 8, 2024 at 17:48
  • Do you need an id? Could the word be the primary key? Commented Mar 8, 2024 at 18:35
  • Could you show us the sort of slow queries you're doing on this table? Commented Mar 8, 2024 at 18:36
  • Try an explain analyze to make sure the table statistics are up to date. Commented Mar 8, 2024 at 18:38
  • It's pretty difficult to describe entire picture. Table is a tokenized text field of another table and used as a kind of an index for word-based search. ID - is foreign key here. Query itself joins few more tables. explain analyze says scan of this table takes 25 ms out of 27 Commented Mar 8, 2024 at 19:06

2 Answers 2

2

I'm a bit surprised it doesn't just work, at least in the C collation. But I can verify that it doesn't.

You could rewrite the query manually the same way word like 'abc%' sometimes gets rewritten:

explain analyze
select * from words where word >='abc' and word <'abd'

But this is only guaranteed to give the same answer in the C collation.

By the way, you should check partition pruning with EXPLAIN ANALYZE. It is possible for the partition pruning to only happen at run time, in which case all partitions still show up in the plan EXPLAIN plan. (But run-time pruning isn't the case here, I checked)

For non-C collation, you can use text_pattern_ops as Laurenz alludes to. As long as you spell it correctly, unlike my first attempt.

create table words2
(
  id   int,
  word varchar
) partition by RANGE (word text_pattern_ops);

But then you still need to rewrite the query in order to make the partition pruning happen.

explain analyze
select * from words2 where word like 'abc%' and word ~>=~ 'abc' and word  ~<~ 'abd';

You need both the LIKE and the inequality range, because the range alone might return false positives. (I don't know exactly when it might do that, but since the planner is worried about, I figure that worry is well-founded)

Sign up to request clarification or add additional context in comments.

2 Comments

With collations other than C, you should be able to use text_pattern_ops in PARTITION BY. Did you try that?
Yes, that works. see edit. I had just typed the operator name wrong and assumed the error meant it wouldn't work at all. It still doesn't work with just the LIKE though, you need the manual intervention.
1

I cannot replicate your results. Your queries use an index, even with only a handful of rows. Make sure your tables have been analyzed and run your queries with explain analyze.


An improved table design would drop the id and use the word as a primary key, assuming they're unique. I've added about 1000 words and analyzed the table.

create table words ( word text primary key );

copy words(word) from '/Users/schwern/tmp/words.txt';

analyze words;

All of your queries do an index-only scan.

The default B-Tree index can do exact matches (words = 'this'), trailing wildcards (words like 'this%') and ordering. We can improve this further adding a Gist index using trigram ops.

create index word_gin_idx on words using gist(word gist_trgm_ops);

Now queries such as word like '%this%' will use the Gist index.

9 Comments

planning to consider GIST & pg_trgm as another option after dealing with partitioning. just wondering, can partitioning be an option at all?
problem is not in index using rather in the partition scanning. any string, that matches abc% pattern is greater than a and less than n which are used as partition range. so questions is: why can't LIKE query determine exact partition to scan?
@zoryamba The best a partition will do is take you from scanning all words to scanning all words in a partition, and only for very particular queries. You still need indexes. I don't know why like is scanning all tables, but I also don't know why it isn't using your index. We need more information.
found why it doesn't use index: stackoverflow.com/questions/61422684/… so it's not the question. question is about like and partitioning
@zoryamba That doesn't seem correct. Your question shows none of your queries using the index, even a simple word = 'this'. A btree should be sufficient for all the questions, but it isn't working for you. If we can't reproduce your problem, we can't say why like isn't acting smarter about your partitions. We need more detail, something is missing from your question. Preferably a demonstration in dbfiddle.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.