4

This is what I want Django to generate in SQL:

select avg(subquery.countval) from (
select count(something) countval,user_id from foo group by user_id
 ) subquery

How I think this should work with Django based on the Annotated aggregation documentation:

Foo.objects.all().values('user_id').\
                 annotate(countval=Count('id')).\
                 aggregate(Avg('countval'))

The problem is that Django 4.x doesn't generate the correct query. You get something like:

SELECT FROM (SELECT foo.user_id as user_id,COUNT(foo.id) 
 AS countval from foo 
 group by foo.user_id)

Any ideas? I debugged through the source but it isn't obvious what is going wrong.

2
  • Just a short remark: Your order_by() statement when using the django ORM is obsolete, because you do not provide the ordering column... Commented Jun 26, 2013 at 18:37
  • 1
    fixed; I had an order by in my real code that I was attempting to clear Commented Jun 26, 2013 at 18:41

1 Answer 1

1

I wasn't able to do it with pure Django code, but this is the best I could do, depending as much as possible on Django code instead of raw sql.

from django.db import connection
from django.db.models import Count

def get_average_count(klass, field_name):
    foo = klass.objects.values(field_name).annotate(countval=Count('id'))
    query = "SELECT AVG(subquery.countval) FROM (%s) subquery" % str(foo.query)
    cursor = connection.cursor()
    cursor.execute(query)
    return float(cursor.fetchone()[0])

This will execute the exact SQL statement you said you wanted to generate. It is also fully independent of the SQL backend you're using, and fully reusable (yay DRY) for all classes with reverse ForeignKey or ManyToMany relationships.

If you really don't want to use raw SQL, another option is to calculate the average in Django:

from __future__ import division # no need to cast to float now

def get_average_count(klass, field_name):
    counts = klass.objects.values(field_name).annotate(countval=Count('id')).\
        values_list('countval', flat=True)
    return reduce(lambda x, y: x + y / len(counts), counts, 0)

You might want to check for any performance differences if you're planning to have large datasets in your database.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.