2

MODELS

class ModelA(models.Model):
    name = models.CharField()

class ModelB(models.Model):
    MY_CHOICES = (
        ('X', 'X'),
        ('Y', 'Y'),
        ('Z', 'Z'),
    )
    modela = models.ForeignKey(ModelA, on_delete=models.CASCADE)
    txt_1 = models.CharField(choices=MY_CHOICES)
    txt_2 = models.CharField(choices=MY_CHOICES)

Given the simplified example above, how could I count how many times each of the choices values has been recorded given that there are Two fields that need to be counted?

Ideally the result would be something along the lines of:

{'X': 15, 'Y': 27, 'Z': 89}

I have tried the following, however in my real model, I have around 20 fields to count across and this is not giving the results I was hoping for:

ModelA.objects.values('modelb__txt1', 'modelb__txt2').annotate(Count('modelb__txt1', 'modelb__txt2'))

I have previously created huge dictionaries and manually sorted/counted the values, but this is now unmanageable and ugly.

3
  • Say there is a row (X, Y), does this mean we count this as one for both X and Y? Commented Jul 27, 2018 at 12:06
  • What in case it is (X, X)? Does this count for 1 to X? Or twice? Commented Jul 27, 2018 at 12:06
  • @WillemVanOnsem - We would count each value so (X, Y) would be {'X': 1, 'Y':1} and (X, X) would be {'X': 2, 'Y': 0}. Commented Jul 27, 2018 at 12:11

1 Answer 1

1

With one query (for a limited number of columns)

With one query, we can do this like:

from django.db.models import Count

qs = ModelB.objects.values('txt_1', 'txt_2').annotate(
    cnt=Count('id')
).order_by('txt_1', 'txt_2')

But now we are still not there, since now we have for every combination of txt_1 and txt_2 the number of elements. We want to "flatten" this to every individual choice. We can do this for example by constructing a Counter [Python-doc]:

from collections import Counter

result = Counter()
for row in qs:
    result[row['txt_1']] += row['cnt']
    result[row['txt_2']] += row['cnt']

For every row of this QuerySet, we thus add the number (cnt) to the two keys. This thus means that we count a row where both txt_1, and txt_2 have value 'X' twice.

A Counter is a subclass of a dictionary, but if you want to cast it to a dictionary, you can later write:

result_dict = dict(result)

Options that are never selected, will not appear in the dictionary, since the queryset will not contain these, and thus we will never add these to the Counter. But we can of course post-process the dictionary, and add 0 for these.

With n queries (with n the number of columns)

The above will work usually quite well. If the number of choices is however quite large, the processing will be more on the Python side, which is typically slower. We can then make a linearization, and work with two queries:

from collections import Counter
from django.db.models import Count

result = Counter()
for col in ['txt_1', 'txt_2']:
    qs = ModelB.objects.values(col).annotate(cnt=Count('id')).order_by(col)
    result.update({q[col]: q['cnt'] for q in qs})

This will here reduce in two queries. But in this case each query will (at most) return three rows. Whereas the other approach will result in one query returning (at most) nine rows. For a small amount of rows, that is not a problem. But the number of cases can easily grow exponential in the number of columns.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the quick answer, Ill give this a try now!
Thanks again, the single query approach appears to be working great!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.