I'm optimizing a slow page load in our (first) Django project. The overall project does test status management, so there are protocols which have cases which have planned executions. Currently the code is:
protocols = Protocol.active.filter(team=team, release=release)
cases = Case.active.filter(protocol__in=protocols)
caseCount = cases.count()
plannedExecs = Planned_Exec.active.filter(case__in=cases, team=team, release=release)
# Start aggregating test suite information
# pgi Model
testSuite['pgi_model'] = []
for pgi in PLM.objects.filter(release=release).values('pgi_model').distinct():
plmForPgi = PLM.objects.filter(pgi_model=pgi['pgi_model'])
peresults = plannedExecs.filter(plm__in=plmForPgi).count()
if peresults > 0:
try:
testSuite['pgi_model'].append((pgi['pgi_model'], "", "", peresults, int(peresults/float(testlistCount)*100)))
except ZeroDivisionError:
testSuite['pgi_model'].append((pgi['pgi_model'], "", "", peresults, 0))
# Browser
testSuite['browser'] = []
for browser in BROWSER_OPTIONS:
peresults = plannedExecs.filter(browser=browser[0]).count()
try:
testSuite['browser'].append((browser[1], "", "", peresults, int(peresults/float(testlistCount)*100)))
except ZeroDivisionError:
testSuite['browser'].append((browser[1], "", "", peresults, 0))
# ... more different categories are aggregated below, then the report is generated...
This code makes a lot of SQL statements. The PLM.objects.filter(release=release).values('pgi_model').distinct() returns a list of 50 strings, and the two filter operations both execute an SQL statement for each string, meaning 100 SQL statements for just this for loop. (Also, it seems like that should use values_list with flat=True.)
Since I want to get information about relevant cases and plannedExecutions, I think I really only need to retrieve those two tables, then perform some analysis on that. Using filter and count() seemed like the obvious solution at the time, but I'm wondering if I wouldn't be better off just building a dict of relevant case and plannedExecution information using .values() and then analyzing that instead, so as to avoid unnecessary SQL statements. Any helpful advice? Thanks!
Edit: In trying to profile this to understand where the time goes, I'm using Django Debug toolbar. It explains that there are over 200 queries, and each of which runs extremely quickly, so that overall they account for very little time. However, could it be that the execution of the SQL is relatively quick, but the building of the ORM adds up, given that it happens over 200 times? I refactored a previous page which took 3 minutes to load, and used values() instead of the ORM, thus getting the page load down to 2.7 seconds and 5 SQL statements.