Django: How do I avoid unnecessary SQL statements?

Question

I'm optimizing a slow page load in our (first) Django project. The overall project does test status management, so there are protocols which have cases which have planned executions. Currently the code is:

protocols = Protocol.active.filter(team=team, release=release)
cases = Case.active.filter(protocol__in=protocols)
caseCount = cases.count()
plannedExecs = Planned_Exec.active.filter(case__in=cases, team=team, release=release)

# Start aggregating test suite information 
# pgi Model
testSuite['pgi_model'] = []
for pgi in PLM.objects.filter(release=release).values('pgi_model').distinct():
    plmForPgi = PLM.objects.filter(pgi_model=pgi['pgi_model'])
    peresults = plannedExecs.filter(plm__in=plmForPgi).count()
    if peresults > 0:
        try:
            testSuite['pgi_model'].append((pgi['pgi_model'], "", "", peresults, int(peresults/float(testlistCount)*100)))
        except ZeroDivisionError:
            testSuite['pgi_model'].append((pgi['pgi_model'], "", "", peresults, 0))

# Browser
testSuite['browser'] = []
for browser in BROWSER_OPTIONS:
    peresults = plannedExecs.filter(browser=browser[0]).count()
    try:
        testSuite['browser'].append((browser[1], "", "", peresults, int(peresults/float(testlistCount)*100)))
    except ZeroDivisionError:
        testSuite['browser'].append((browser[1], "", "", peresults, 0))

# ... more different categories are aggregated below, then the report is generated...

This code makes a lot of SQL statements. The PLM.objects.filter(release=release).values('pgi_model').distinct() returns a list of 50 strings, and the two filter operations both execute an SQL statement for each string, meaning 100 SQL statements for just this for loop. (Also, it seems like that should use values_list with flat=True.)

Since I want to get information about relevant cases and plannedExecutions, I think I really only need to retrieve those two tables, then perform some analysis on that. Using filter and count() seemed like the obvious solution at the time, but I'm wondering if I wouldn't be better off just building a dict of relevant case and plannedExecution information using .values() and then analyzing that instead, so as to avoid unnecessary SQL statements. Any helpful advice? Thanks!

Edit: In trying to profile this to understand where the time goes, I'm using Django Debug toolbar. It explains that there are over 200 queries, and each of which runs extremely quickly, so that overall they account for very little time. However, could it be that the execution of the SQL is relatively quick, but the building of the ORM adds up, given that it happens over 200 times? I refactored a previous page which took 3 minutes to load, and used values() instead of the ORM, thus getting the page load down to 2.7 seconds and 5 SQL statements.

You should probably profile it to see if the slowdown is due to the SQL, or to the python code. — Oliver
– Oliver, Commented Mar 14, 2012 at 15:25
Just edited my answer - I'm using Django Debug toolbar for profiling, which implies the SQL calls all happen relatively quickly, but it's very hard to see where the time goes in that case. What tools do you use for profiling? — Nathan
– Nathan, Commented Mar 14, 2012 at 15:40

Marcin · Accepted Answer · 2012-03-14 16:03:13Z

1

Creating a queryset does not hit the database; only accessing results from it does. Accordingly, merely creating querysets is not your issue.

Note that passing a queryset to to another queryset does not create two queries. Accordingly, building dicts will not reduce the number of database hits.

If you can build up dicts, it may be that you manage to create a simpler query than you would otherwise, which would speed up the actual query execution. That is something of a separate issue, however.

edited Mar 14, 2012 at 16:03

answered Mar 14, 2012 at 15:42

Marcin

50.1k18 gold badges137 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Nathan Over a year ago

Right - accessing results is what hits the database. The natural way to access results using Django's ORM in this instance seems to be to build over 200 different querysets. Is there a way to avoid this other than building dicts outside of the ORM?

Marcin Over a year ago

@Nathan I repeat: Building querysets does not hit the database. You must access the contents of the queryset. Note that passing a queryset to to another queryset does not create two queries. Accordingly, building dicts will not reduce the number of database hits.

Nathan Over a year ago

I build 200 different querysets and access data from each one. I've stepped through the code and watched the SQL log so I know what you mean. One way to reduce the number of SQL statements is to get all the relevant rows of the database at once by using values() and not filtering, and then work with objects in Python.

Marcin Over a year ago

@Nathan Yes, that is correct. You may like to test that approach, and profile it. However, given that databases are optimised for querying, you may find that this is less efficient if you have to traverse your object graph a lot.

Nathan Over a year ago

Okay. I've done that before and improved a page loadtime from 3 minutes to 3 seconds, but it was fairly clunky and I asked another question about it, where they suggested there's ways to do these things using the model manager.

Furbeenator · Accepted Answer · 2012-03-14 16:39:13Z

This strikes me as a case for reverse foreign key lookups. We should be able to reduce the top for loop by getting all pgi_models associated with PLMs in the release. I assume you have a model for PGI, for which the PLM model has a foreign key field named pgi_model. If this is the case, you can find the PGIs in a PLM release with the following. You still have a loop, but the iterations of the loop should be reduced, theoretically:

pgis = PGI.objects.filter(plm__in=PLM.objects.filter(release=release))
for pgi in pgis:
    peresults = plannedExecs.filter(plm=pgi.plm).count()
    if peresults > 0:
        try:
            testSuite['pgi_model'].append((pgi['pgi_model'], "", "", peresults, int(peresults/float(testlistCount)*100)))
        except ZeroDivisionError:
            testSuite['pgi_model'].append((pgi['pgi_model'], "", "", peresults, 0))

Collectives™ on Stack Overflow

Django: How do I avoid unnecessary SQL statements?

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related