4

I've read quite a few posts on how to make a Database/Model unique in Django and it all seems to work. However, I do not see any posts discussing an efficient way of avoiding adding duplicate entries to a database.

My model looks like this:

# from app.models
class TestModel(models.Model):
    text = models.TextField(unique=True, null=True)
    fixed_field = models.TextField()

The way I currently avoid adding duplicate entries without getting an error is as follows.

# from app.views
posts = ["one", "two", "three"]
fixed_field = "test"

for post in posts:
    try:
        TestModel(text=post, fixed_field = fixed_field).save()
    except IntegrityError:
        pass

If I would not do this I would get an IntegrityError. Is there any way I could make this more efficient?

1 Answer 1

2

If you are adding items in bulk, you can try to prevent adding these items in the first place, by fetching the texts that need to be unique, and then thus make a list of TestModels that introduce no duplicates:

used_text = set(TestModel.objects.values_list('text', flat=True))

posts = ['one', 'two', 'three']
fixed_field = "test"

test_models = []
for post in posts:
    if post not in used_text:
        used_text.add(post)
        test_models.add(TestModel(text=post, fixed_field = fixed_field))

test_models.objects.bulk_create(test_models)

The .bulk_create(..) [Django-doc] then creates all records in bulk, normally in one query. If the number of elements is huge in multiple queries, but each query will insert a large number of records.

Due to race conditions however, the above can still fail, since between fetching the texts in the database, and adding new ones, other queries can update the state of the database, so although not that likely, you probably should work with a retry-mechanism that filters out TestModels from the list again, and tries to re-insert these.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much for your response. Just for my own intuition, the solution I have currently would be "race condition proof", right? Although I can imagine that it is slower than properly implementing some sort of retry-mechanism.
@MennoVanDijk: if the database is race-condition proof, yes :), although if you have a large number of items to insert (1000+), it will take some time, since you each time make a roundtrip to the database. If the database is not modified that often (so collisions are unlikely to happen), then normally inserting in bulk will be faster. of course if the amount of concurrent inserts is very high, then eventually it is possible that you keep retrying to insert, and each time fail to do so.
Alright, very new to webdev so still looking for best-practice ways of solving rather trivial issues like these. Thank you for your advice, I will see whether I can find a proper fix in due time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.