Avoiding duplicate DB indexes with Django's unique_together

Question

I have a project running on PostgreSQL v9.4 and Django 1.6 (planning to upgrade to v1.8 soon...)

Let's say I have the following model:

class Product(models.Model):
    x = models.ForeignKey(Shop)
    y = models.PositiveIntegerField()
    class Meta:
        unique_together = (("x", "y"), )

As far as I understand:

Field x is a foreign key, so unless requested otherwise, Django automatically adds a db index for (x)
Due to the unique_together statement, PostgreSQL implicitly generates a combined index for (x, y)
Generally, an index for (x, y) also serves as an index for (x)
Duplicates indexes consume extra time in SQL INSERT operations

My conclusion is that in such a scenario, it is better to explicitly declare db_index=false for field x, to avoid the duplicate indexes.

Is this a valid conclusion?

Sami Kuhmonen · Accepted Answer · 2015-05-21 14:13:33Z

0

As per documentation, an index on (x, y) will serve also as an index for queries against x. Since the index has x as the first column, it is also more efficient than (y, x). So in that sense you are correct.

Tables with multiple indexes will consume more time on insert since more indexes will be updated. The amount of time depends on the table structure.

In the end it depends on the amount of data y will cause. If there is a lot of variation on y, the index size may be a lot bigger than an index on just x. In that case querying based on this index will cause more data to be loaded and processed from the index. If the variation is low and index size is not much larger, it may be better to just have the (x, y) index.

But, as always, this requires benchmarking, benchmarking and benchmarking.

Note on data integrity

If you decide to drop the index on x and only have the unique index on (x, y), then x itself may include non-unique values, as long as the y is different. Depending on the implementation this may or may not be important.

edited May 21, 2015 at 14:13

answered May 21, 2015 at 13:55

Sami Kuhmonen

31.4k9 gold badges67 silver badges81 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

o_c Over a year ago

perhaps I was not clear enough in my explanation: I need the (x, y) index for sure (benchmarking showed so as well). The question was regarding the (x) index that is automatically created because it's a foreign key: Does it make sense to explicitly ask Django not to create it (db_index=false)?

Sami Kuhmonen Over a year ago

Yes, I did understand that. That's why it depends on the amount of data y would create, so it needs benchmarking, or at least checking index sizes on a real data. Also it affects data integrity, will amend that into my answer.

Collectives™ on Stack Overflow

Avoiding duplicate DB indexes with Django's unique_together

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related