MySQL joined subqueries optimization

Question

I have the following query in MySQL

 select * from 
    (
        select asiento, fecha, sum(debe) as debe, sum(haber) as haber
        from apuntes 
        where apuntes.sobreescrito is null
        group by asiento, fecha
        order by fecha, asiento
    )asientos 
    left join
    (
        select id_diario, asiento, fecha, sum(debe) as debe, sum(haber)  as haber
        from apuntes 
        where apuntes.sobreescrito is not null
        group by asiento, fecha, id_diario
        order by fecha, asiento
    )asientos_antiguos 
    on asientos.asiento = asientos_antiguos.asiento and asientos.fecha  = asientos_antiguos.fecha
    where 
        asientos_antiguos.debe <> asientos.debe 
        or 
        asientos_antiguos.haber <> asientos.haber

Thi first subquery(asientos) returns around 20k records, while the second query, in a normal situation, should return no more thatn 20k records, that produces an acceptable query of 3-4 seconds but, in theory, it could duplicate records on each operation, so I´m testing with 100k records produced, and it takes around 30 seconds (not acceptable).

At this point, I tried to create indexes for fields "asiento" and "fecha", but subqueries don´t benefit from this. Also, I created two views for each subquery, hoping I could create an Index in those subviews, but, views limitations include "no indexes".

Any help would be appreciated.

EDIT 1

Ok, I will try to give an explanation of what I´m trying to achieve, and feel free to correct my English, I´m gonna use financial words that I probably don´t know.

I have developed a web app that read excel files with book entries ( Each one containing usually 20k records ) and save those book entries to a table (apuntes, in my case).

Sometimes, some of those entries may be overwritten in the case that the fields "fecha" and "asiento" are the same and the field "id_diario" is different (NOTE: each excel book generates a set of book entries with its own "id_diario", so I can discriminate older records)

At this point, everything goes ok, but now I have to generate a report indicating if, at some point, overwritten book entries(financial word, don´t know if it´s correct) have a different amount than the new ones that overwrited them.

And here is when I came with this query, where 1st subquery takes all records that are not overwriten (apuntes.sobreescrito is NULL) and the 2nd subquery takes all overwritten records matching 1st subquery.

In my test case, 2nd subquery generates 3 overwriten records for each valid book entry(there were 3 overwriting operations) what means comparing 60k vs 20k records.

As a next step, I will use "GROUP_CONCAT" operation to generate a json-formated array with the results of the 2nd query, but 1st, I have to fix the performance issue.

The sub queries can use indexes internally, but when joining the results of these sub queries the indexes are not used. It might be possible to do this without sub queries, but I would need more idea of what your data is (for example, what is the column id_diaro which is only brought back from the 2nd sub query?) — Kickstart
– Kickstart, Commented Oct 21, 2014 at 9:02
Can you describe what the query is doing? There may be a simpler way to implement the logic. — Gordon Linoff
– Gordon Linoff, Commented Oct 21, 2014 at 9:37
Just reading your edit. On your output do you want listed records that have not been updated (ie, no matching record in the second sub query) or only those where there has been an update AND the sums are different? — Kickstart
– Kickstart, Commented Oct 21, 2014 at 12:50
Wonder if you can use a more basic query to determine the records where there is a mismatch, and then just get the real details for those. This might help if the number of changed records you are interested in are only a small number of all the records. For example something like this should find records that have changed - SELECT asiento, fecha FROM ( SELECT id_diario, asiento, fecha, SUM(debe) AS debe, SUM(haber) AS haber FROM apuntes GROUP BY id_diario, asiento, fecha ) sub0 GROUP BY asiento, fecha HAVING MIN(debe) != MAX(debe) OR MIN(haber) != MAX(haber) — Kickstart
– Kickstart, Commented Oct 21, 2014 at 12:56
@Kickstart that worked, post your query as answer and I will mark it as accepted. Thank you — sergio0983
– sergio0983, Commented Oct 21, 2014 at 16:12

Kickstart · Accepted Answer · 2014-10-21 16:18:24Z

1

Wonder if you can use a more basic query to determine the records where there is a mismatch, and then just get the real details for those. This might help if the number of changed records you are interested in are only a small number of all the records.

For example something like this should find records that have changed -

SELECT asiento, fecha 
FROM 
( 
    SELECT id_diario, asiento, fecha, SUM(debe) AS debe, SUM(haber) AS haber 
    FROM apuntes 
    GROUP BY id_diario, asiento, fecha 
) sub0 
GROUP BY asiento, fecha 
HAVING MIN(debe) != MAX(debe) 
OR MIN(haber) != MAX(haber)

You could maybe use this to narrow down the actual records that you need to check.

answered Oct 21, 2014 at 16:18

Kickstart

21.5k2 gold badges26 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

MySQL joined subqueries optimization

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related