When aggregating a queryset, I noticed that if I use an annotation before, I get a wrong result. I can't understand why.
The Code
from django.db.models import QuerySet, Max, F, ExpressionWrapper, DecimalField, Sum
from orders.models import OrderOperation
class OrderOperationQuerySet(QuerySet):
def last_only(self) -> QuerySet:
return self \
.annotate(last_oo_pk=Max('order__orderoperation__pk')) \
.filter(pk=F('last_oo_pk'))
@staticmethod
def _hist_price(orderable_field):
return ExpressionWrapper(
F(f'{orderable_field}__hist_unit_price') * F(f'{orderable_field}__quantity'),
output_field=DecimalField())
def ordered_articles_data(self):
return self.aggregate(
sum_ordered_articles_amounts=Sum(self._hist_price('orderedarticle')))
The Test
qs1 = OrderOperation.objects.filter(order__pk=31655)
qs2 = OrderOperation.objects.filter(order__pk=31655).last_only()
assert qs1.count() == qs2.count() == 1 and qs1[0] == qs2[0] # shows that both querysets contains the same object
qs1.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.72')} # expected result
qs2.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.01')} # wrong result
How is it possible that this last_only annotation method can make the aggregation result different (and wrong)?
The "funny" thing is that is seems to happen only when the order contains articles that have the same hist_price:

Side note
- I can confirm that the SQL created by Django ORM is probably wrong, because when I force execution of
last_only()and then I call aggregation in a second query, it works as expected. - https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#combining-multiple-aggregations could be an explanation?
SQL Queries
(note that these are the actual queries but the code above has been slightly simplified, which explains the presence below of COALESCE and "deleted" IS NULL.)
-- qs1.ordered_articles_data()
SELECT
COALESCE(
SUM(
("orders_orderedarticle"."hist_unit_price" * "orders_orderedarticle"."quantity")
),
0) AS "sum_ordered_articles_amounts"
FROM "orders_orderoperation"
LEFT OUTER JOIN "orders_orderedarticle"
ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)
-- qs2.ordered_articles_data()
SELECT COALESCE(SUM(("__col1" * "__col2")), 0)
FROM (
SELECT
"orders_orderoperation"."id" AS Col1,
MAX(T3."id") AS "last_oo_pk",
"orders_orderedarticle"."hist_unit_price" AS "__col1",
"orders_orderedarticle"."quantity" AS "__col2"
FROM "orders_orderoperation" INNER JOIN "orders_order"
ON ("orders_orderoperation"."order_id" = "orders_order"."id")
LEFT OUTER JOIN "orders_orderoperation" T3
ON ("orders_order"."id" = T3."order_id")
LEFT OUTER JOIN "orders_orderedarticle"
ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)
GROUP BY
"orders_orderoperation"."id",
"orders_orderedarticle"."hist_unit_price",
"orders_orderedarticle"."quantity"
HAVING "orders_orderoperation"."id" = (MAX(T3."id"))
) subquery
.querygenerated for both of these querysets?