86

I have two models A and B. All B objects have a foreign key to an A object. Given a set of A objects, is there anyway to use the ORM to get a set of B objects containing the most recent object created for each A object.

Here's an simplified example:

class Bakery(models.Model):
    town = models.CharField(max_length=255)

class Cake(models.Model):
    bakery = models.ForeignKey(Bakery, on_delete=models.CASCADE)
    baked_at = models.DateTimeField()

So I'm looking for a query that returns the most recent cake baked in each bakery in Anytown, USA.

1
  • 7
    I'd love to see that too :-) Commented Jan 15, 2010 at 20:30

7 Answers 7

42

Starting from Django 1.11 and thanks to Subquery and OuterRef, we can finally build a latest-per-group query using the ORM.

hottest_cakes = Cake.objects.filter(
    baked_at=Subquery(
        (Cake.objects
            .filter(bakery=OuterRef('bakery'))
            .values('bakery')
            .annotate(last_bake=Max('baked_at'))
            .values('last_bake')[:1]
        )
    )
)

#BONUS, we can now use this for prefetch_related()
bakeries = Bakery.objects.all().prefetch_related(
    Prefetch('cake_set',
        queryset=hottest_cakes,
        to_attr='hottest_cakes'
    )
)

#usage
for bakery in bakeries:
    print 'Bakery %s has %s hottest_cakes' % (bakery, len(bakery.hottest_cakes))
Sign up to request clarification or add additional context in comments.

5 Comments

This worked perfectly, though my use-case was slightly different. What I like about this approach is 1) it keeps the resulting queryset in the target model instance, and 2) it doesn't exclude model instances that don't have related data (in the context of the question, bakeries that haven't baked anything yet).
you are smartest guy
So this gives all the hotests Cakes as a QuerySet. Very interesting. I also needed this, but then needed to filter on them in the bakery, based on an aggregate ('cake__topping'). It seems to work fine.
If you want multiple fields at once, see the answer here (Django 3.2+): stackoverflow.com/a/73088863/6063532
@suayipuzulmez this looks interesting, i will definitely give it a try :)
39

As far as I know, there is no one-step way of doing this in Django ORM, but you can split it into two queries:

from django.db.models import Max

bakeries = Bakery.objects.annotate(
    hottest_cake_baked_at=Max('cake__baked_at')
) 
hottest_cakes = Cake.objects.filter(
    baked_at__in=[b.hottest_cake_baked_at for b in bakeries]
)

If id's of cakes are progressing along with bake_at timestamps, you can simplify and disambiguate the above code (in case two cakes arrives at the same time you can get both of them):

from django.db.models import Max

hottest_cake_ids = Bakery.objects.annotate(
    hottest_cake_id=Max('cake__id')
).values_list('hottest_cak‌​e_id', flat=True)

hottest_cakes = Cake.objects.filter(id__in=hottest_cake_ids)

BTW credits for this goes to Daniel Roseman, who once answered similar question of mine:

http://groups.google.pl/group/django-users/browse_thread/thread/3b3cd4cbad478d34/3e4c87f336696054?hl=pl&q=

If the above method is too slow, then I know also second method - you can write custom SQL producing only those Cakes, that are hottest in relevant Bakeries, define it as database VIEW, and then write unmanaged Django model for it. It's also mentioned in the above django-users thread. Direct link to the original concept is here:

http://web.archive.org/web/20130203180037/http://wolfram.kriesing.de/blog/index.php/2007/django-nice-and-critical-article#comment-48425

Hope this helps.

5 Comments

I'll probably go with the second set of queries you suggested. Thanks.
This would be more efficient is you used a value_list for the first query:hottest_cake_ids = Bakery.objects.annotate(hottest_cake_id=Max('cake__id')).values_list('hottest_cake_id', flat=True); hottest_cakes = Cake.objects.filter(id__in=hottest_cake_ids)
Also, if you happen to be using PostGreSQL, there is a one step solution.
Doesn't the first solution create an issue where the the latest date for one is before the latest date of another, but exists in another? A = [1, 2, 3], B = [1, 2]. A latest = 3, B latest = 2. The first query seems to get A's 2 and 3, as well as B's 2.
Starting from Django 1.11 now there is a one-way step of doing it. Check my answer.
25

If you happen to be using PostGreSQL, you can use Django's interface to DISTINCT ON:

recent_cakes = Cake.objects.order_by('bakery__id', '-baked_at').distinct('bakery__id')

As the docs say, you must order by the same fields that you distinct on. As Simon pointed out below, if you want to do additional sorting, you'll have to do it in Python-space.

5 Comments

Love the approach - thanks. Just did a minor fix concerning the final ordering. Depending on the total size of the QS, this may be better or worse than the accepted answer. In my case: better :)
I think that is an unnecessary complication to the code, and goes beyond what is answered. I'm going to assume that folks can figure out how to sort the resulting data.
I played a lot with similar problem, trying Max annotations and filtering on them, but they were finally failing on db-side because of improper sql after django optimizer removes order_by (when using result as filter subquery or when aggregating, ex .count()). This solution doesn't break all the things when fetching recent_cakes.count() and don't throw errors when doing Cake.objects.filter(pk__in=recent_cackes).filter(other_conditions),but the latest example return random cakes per bakery that satisfy other_conditions (not hottest!), because django removes order_by from subquery :(
yeah, for that reason, I think that if you aren't using postGreSQL, Tomasz Zielinski's answer is the way to go.
Doesn't this order primarily by the bakery_id order, not the baked at date, which throws things off?
5

This should do the job:

from django.db.models import Max
Bakery.objects.annotate(Max('cake__baked_at'))

3 Comments

I haven't tested yet, but that looks like it will annotate the time that each bakery most recently baked a cake. I'm looking for for the actual cake objects. Am I misinterpreting your answer?
Yes, you're right. I had forgotten the previous answer I posted for Tomasz :-)
I believe this will only work if sorting of cakes by ids and by date result the same ordering. In a generic case where the primary key sequence does not correspond to the chronological order as defined by the date field, this will not work.
4

I was fighting with similar problem and finally come to following solution. It does not rely on order_by and distinct so can be sorted as desired on db-side and also can be used as nested query for filtering. I also believe this implementation is db engine independent, because it's based on standard sql HAVING clause. The only drawback is that it will return multiple hottest cakes per bakery, if they are baked in that bakery at exactly same time.

from django.db.models import Max, F

Cake.objects.annotate(
    # annotate with MAX "baked_at" over all cakes in bakery
    latest_baketime_in_bakery=Max('bakery__cake_set__baked_at')
    # compare this cake "baked_at" with annotated latest in bakery
).filter(latest_baketime_in_bakery__eq=F('baked_at'))

Comments

0
Cake.objects.filter(bakery__town="Anytown").order_by("-created_at")[:1]

I haven't built out the models on my end, but in theory this should work. Broken down:

  • Cake.objects.filter(bakery__town="Anytown") Should return any cakes whom belong to "Anytown", assuming the country is not part of the string. The double underscores between bakery and town allow us to access the town property of bakery.
  • .order_by("-created_at") will order the results by their created date, most recent first (take note of the - (minus) sign in "-created_at". Without the minus sign, they'd be ordered by oldest to most recent.
  • [:1] on the end will return only the 1st item in the list which is returned (which would be a list of cakes from Anytown, sorted by most recent first).

Note: This answer is for Django 1.11. This answer modified from Queries shown here in Django 1.11 Docs.

Comments

0

@Tomasz Zieliński solution above did solve your problem but it did not solve mine, because I still need to filter the Cake. So here is my solution

from django.db.models import Q, Max

hottest_yellow_round_cake = Max('cake__baked_at', filter=Q(cake__color='yellow', cake__shape='round'))

bakeries = Bakery.objects.filter(town='Chicago').annotate(
    hottest_cake_baked_at=hottest_yellow_round_cake
)

hottest_cakes = Cake.objects.filter(
    baked_at__in=[b.hottest_cake_baked_at for b in bakeries]
)

With this approach, you can also implement other things like Filter, Ordering, Pagination for Cakes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.