15

I'm trying to improve some existing code which originally took 3 minutes to prepare a large dataTable (then returned by Ajax). The old code iterated over a large querySet, gathering information from a variety of related objects. From what I've read, and from monitoring the SQL log, iterating over querysets is generally a bad idea, because SQL is executed for each item. Instead, I've been using values to gather information in a single SQL statement, then iterating through that. Using this technique, I've reduced the execution time to under 15 seconds (and I'm still not done). However because I'm no longer using model objects, I can't use get_FOO_display(). Is there a way to use this functionality while using values()?

Simplified, the original was :

for user in users:
   data.append(user.get_name_display())  # Appends 'Joe Smith'
return data

And the new code is:

for user in users.values('name'):
   data.append(user['name'])  # Appends 'JSmith001', which is incorrect
return data

Also, if there's some other way to preserve the creation of model objects yet only requires a single SQL statement on the backend, I'd love to know about it. Thanks!

2
  • Need to see more code. This is too general to answer. Commented Mar 12, 2012 at 21:25
  • Sorry, I added comments above, but I'm not sure what else you want to see. Commented Mar 12, 2012 at 21:43

2 Answers 2

6

In general, it'll probably be better and easier to use a Manager-based query that returns model objects. It sounds like the issue with your original approach is not that you were iterating over your queryset (as @ahmoo says, this isn't a performance issue), but that within your iteration loop you were getting additional related objects, requiring one or more additional queries for each record.

There are several ways to improve performance with queries that still return model instances:

  • It sounds like the most relevant is select_related(), which will effectively do a table join on the initial query to include data for all objects related by foreign keys.

  • If that's insufficient, you can also add data to your model instances with extra(), which allows you to stick subqueries into your SQL.

  • And if all that fails, you can perform raw SQL queries using the .raw() method on a Manager instance, which will still return model instances.

Basically, if you can do it in SQL in a way that gives you one row per instance, there's a way to do it in Django and get model instances back.

To answer your original question, though, you can get the display name through the Field class - it's just ugly:

def get_field_display(klass, field, value):
    f = klass._meta.get_field(field)
    return dict(f.flatchoices).get(value, value)

# usage
get_field_display(User, 'name', 'JSmith001')
Sign up to request clarification or add additional context in comments.

7 Comments

Hm. When I do User.objects.all() then iterate through that doing something like print user.name, it only executes a single SQL statement. However, when I do print user.team.name, it now executes an SQL statement for each user, even if I use select_related or not.
Is user.team a foreign-key relationship, or a many-to-many relationship? select_related() only follows ForeignKey fields, I think.
Sorry, it's a many-to-many relationship. Right, I had read that, and tried prefetch_related briefly before realizing it was a development only currently. I think this is why I ended up going with values to prepopulate my data into dicts, which is working great (down to 2.7 seconds with final fixes in) except that I can't use get_FOO_display.
See my answer to the get_FOO_display issue, above.
Thanks! That works. I'll look into alternate ways to use Manager based queries for one of the next 5 pages I'm refactoring. I've read in several places that one way to speed things up is to use values() and avoid building django's ORM, but things definitely get a lot trickier.
|
3

iterating over querysets is generally a bad idea, because SQL is executed for each item

That's not true. Below is taken from the official docs:

A QuerySet is iterable, and it executes its database query the first time you iterate over it

I think the problem has to do with the definition of the users from your code. What did you assign to it?

3 Comments

We do a lot of foreign key following as well, meaning we hit 'user.team.name'. You're right that they only execute a single query on simple objects, but when I try debugging this now, as soon as I follow a foreign key, there's a separate SQL statement in the log for each user.
@Nathan - Are you trying to access user.team.name or user.name? In your original post, it seems like you need user.name but then you mentioned about user.team.name. It's probably the best if you could post the the definition for the Team model plus expand on what you want to achieve.
Both, as well as a host of different things from different relationships, which are a combination of many-to-many and foreign key. I think there are two primary questions now, 1) is it possible to get the display name for a model without an instance of that model object, and 2) how can I better iterate through a queryset without multiple SQL calls. I'm thinking the answer to #1 is 'no', and the answer to #2 is either an entirely different SO post, or more code introspection.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.