4

When aggregating a queryset, I noticed that if I use an annotation before, I get a wrong result. I can't understand why.

The Code

from django.db.models import QuerySet, Max, F, ExpressionWrapper, DecimalField, Sum
from orders.models import OrderOperation

class OrderOperationQuerySet(QuerySet):
    def last_only(self) -> QuerySet:
        return self \
            .annotate(last_oo_pk=Max('order__orderoperation__pk')) \
            .filter(pk=F('last_oo_pk'))

    @staticmethod
    def _hist_price(orderable_field):
        return ExpressionWrapper(
            F(f'{orderable_field}__hist_unit_price') * F(f'{orderable_field}__quantity'),
            output_field=DecimalField())

    def ordered_articles_data(self):
        return self.aggregate(
            sum_ordered_articles_amounts=Sum(self._hist_price('orderedarticle')))

The Test

qs1 = OrderOperation.objects.filter(order__pk=31655)
qs2 = OrderOperation.objects.filter(order__pk=31655).last_only()
assert qs1.count() == qs2.count() == 1 and qs1[0] == qs2[0]  # shows that both querysets contains the same object

qs1.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.72')}  # expected result

qs2.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.01')}  # wrong result

How is it possible that this last_only annotation method can make the aggregation result different (and wrong)?

The "funny" thing is that is seems to happen only when the order contains articles that have the same hist_price: enter image description here

Side note

SQL Queries (note that these are the actual queries but the code above has been slightly simplified, which explains the presence below of COALESCE and "deleted" IS NULL.)

-- qs1.ordered_articles_data()

SELECT
    COALESCE(
        SUM(
            ("orders_orderedarticle"."hist_unit_price" * "orders_orderedarticle"."quantity")
        ),
        0) AS "sum_ordered_articles_amounts"
FROM "orders_orderoperation"
    LEFT OUTER JOIN "orders_orderedarticle"
        ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)

-- qs2.ordered_articles_data()

SELECT COALESCE(SUM(("__col1" * "__col2")), 0)
FROM (
    SELECT
        "orders_orderoperation"."id" AS Col1,
        MAX(T3."id") AS "last_oo_pk",
        "orders_orderedarticle"."hist_unit_price" AS "__col1",
        "orders_orderedarticle"."quantity" AS "__col2"
    FROM "orders_orderoperation" INNER JOIN "orders_order"
        ON ("orders_orderoperation"."order_id" = "orders_order"."id")
        LEFT OUTER JOIN "orders_orderoperation" T3
            ON ("orders_order"."id" = T3."order_id")
        LEFT OUTER JOIN "orders_orderedarticle"
            ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
    WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)
    GROUP BY
        "orders_orderoperation"."id",
        "orders_orderedarticle"."hist_unit_price",
        "orders_orderedarticle"."quantity"
    HAVING "orders_orderoperation"."id" = (MAX(T3."id"))
) subquery
4
  • Can you show the .query generated for both of these querysets? Commented Mar 4, 2019 at 14:17
  • Could you add output from calling explain() on both querysets? Commented Mar 4, 2019 at 14:22
  • @MalcolmWhite done Commented Mar 4, 2019 at 14:29
  • what is the field name for the goods title? i think the solution to add it to the oreder by Commented Mar 5, 2019 at 7:07

2 Answers 2

1

When you use any annotation in the database language(Aggregate Functions) you should to do group by all fields not inside the function, and you can see it inside the subquery

GROUP BY
    "orders_orderoperation"."id",
    "orders_orderedarticle"."hist_unit_price",
    "orders_orderedarticle"."quantity"
HAVING "orders_orderoperation"."id" = (MAX(T3."id"))

As result the goods with the same hist_unit_price and quantity is filtered by max id. So, based on your screen, one of the chocolate or cafe is excluded by the having condition.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the explanation. Is there any way to solve this at the Django ORM level instead of digging into the SQL?
i don't clearly what is your main point. Why you can't use solution with your first query? if not, maybe you can create the question with your models code, and description od the issue?
My main point is pretty simple: I want to fix my Django ORM request to get the right result. In the example above, that qs2.ordered_articles_data() return the right result (ie: 3.72).
why you can't use your solution qs1? And you should to understand that the Django request is correct.
I think I was not clear in my question. qs1 and qs2 are NOT equivalent. qs2 has an additional filter that I need. But using this filter make the result wrong. qs1 is not a solution, it's just here to show that qs2 does not work as expected.
0

A separation to subqueries with smaller joins is a solution to prevent problems with more joins to children objects, possibly with an unnecessary huge Cartesian product of independent sets or a complicated control of the GROUP BY clause in the result SQL by contribution from more elements of the query.

solution: A subquery is used to get primary keys of the last order operations. A simple query without added joins or groups is used to be not distorted by a possible aggregation on children.

    def last_only(self) -> QuerySet:
        max_ids = (self.values('order').order_by()
                   .annotate(last_oo_pk=Max('order__orderoperation__pk'))
                   .values('last_oo_pk')
                   )
        return self.filter(pk__in=max_ids)

test

ret = (OrderOperationQuerySet(OrderOperation).filter(order__in=[some_order])
       .last_only().ordered_articles_data())

executed SQL: (simplified by removing app name prefix order_ and double quetes ")

SELECT CAST(SUM((orderedarticle.hist_unit_price * orderedarticle.quantity))
       AS NUMERIC) AS sum_ordered_articles_amounts
FROM orderoperation
LEFT OUTER JOIN orderedarticle ON (orderoperation.id = orderedarticle.order_operation_id)
WHERE (
  orderoperation.order_id IN (31655) AND
  orderoperation.id IN (
    SELECT MAX(U2.id) AS last_oo_pk
    FROM orderoperation U0
    INNER JOIN order U1 ON (U0.order_id = U1.id)
    LEFT OUTER JOIN orderoperation U2 ON (U1.id = U2.order_id)
    WHERE U0.order_id IN (31655)
    GROUP BY U0.order_id
  )
)

The original invalid SQL could be fixed by adding orders_orderedarticle".id to GROUP BY, but only if last_only() and ordered_articles_data() are used together. That is not good readable way.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.