MySQL Slow query with multiple joins and subqueries

Question

I have 3 tables:

Pi - images
Pidl - images dl log => Pidl
Pirl - images resize log => Pidl

Basically an image is downloaded and a log record is created in Pidl. After that, it's resized and a record is created in Pirl. Said record being connected to the Pidl record.

I am writing a query as to find which images need to be resized and it basically queries Pidl. The algo I've devised is simple:

for each Image in Pi {
    pidlA=newest_pidl(Image);
    if(pidlA.status == success) {
        pirlA=newest_pirl(Image);
        if(pirlA.pidl.hash != pidlA.hash)
        {
            go;
        }
        else if(pirlA.status != success){
            failed_attempts = failed_pirl_count(pirlA,newest_succesful_pirl(Image))
            decide based on pirlA.time and failed_attempts if go or not
        }
        else
        {
            dont go;
        }
    }
    else
    {
        dont go;
    }
}

And now my query(altough is not yet finished, the failed attempts part is missing, but it's already too slow, so first I'd like to fix that).

SELECT 
pidl1A.pidl_id

FROM Pidl as pidl1A

LEFT JOIN Pidl as pidl2A
ON (
    pidl1A.pidl_pi_id = pidl2A.pidl_pi_id AND 
    pidl2A.pidl_status = 1 AND
    (pidl2A.pidl_time > pidl1A.pidl_time OR 
        (pidl2A.pidl_id > pidl1A.pidl_id and pidl1A.pidl_time=pidl2A.pidl_time)
    )
) 

LEFT JOIN (
    #newest pirl subquery#
    SELECT 
    pidl1B.pidl_pi_id as sub_pi_id, 
    pidl1B.pidl_hash as sub_pidl_hash,
    pirl1B.pirl_id as sub_pirl_id,
    pirl1B.pirl_status as sub_pirl_status
    FROM Pirl as pirl1B 

    INNER JOIN Pidl as pidl1B ON (pirl1B.pirl_pidl_id = pidl1B.pidl_id)

    LEFT JOIN (
        SELECT
        pidl2B.pidl_pi_id as sub_pi_id,
        pirl2B.pirl_id as sub_pirl_id,
        pirl2B.pirl_time as sub_pirl_time
        FROM Pirl as pirl2B 
        INNER JOIN Pidl as pidl2B ON (pirl2B.pirl_pidl_id = pidl2B.pidl_id)
        WHERE 1
    ) as pirl3B 
    ON (
        pirl3B.sub_pi_id = pidl1B.pidl_pi_id and 
        (pirl3B.sub_pirl_time > pirl1B.pirl_time or
            (pirl3B.sub_pirl_time = pirl1B.pirl_time and
            pirl3B.sub_pirl_id > pirl1B.pirl_id)
        )
    )

    WHERE 
    pirl3B.sub_pirl_id is null
) as pirl1A
ON (pirl1A.sub_pi_id = pidl1A.pidl_pi_id)

WHERE 
pidl1A.pidl_status = 1 AND pidl2A.pidl_id IS NULL
AND (
    pirl1A.sub_pirl_id IS NULL
    OR (
        pidl1A.pidl_hash !=  pirl1A.sub_pidl_hash
    )
    OR (
        pirl1A.sub_pirl_status != 1
    )
)

And this is my db schema:

CREATE TABLE Pi (
  `pi_id` int,
   PRIMARY KEY (`pi_id`)
  )
;

CREATE TABLE Pidl
    (
      `pidl_id` int,
      `pidl_pi_id` int,
      `pidl_status` int,
      `pidl_time` int,
     `pidl_hash` varchar(16),
   PRIMARY KEY (`pidl_id`)
    )
;

alter table Pidl
  add constraint fk1_branchNo foreign key (pidl_pi_id) references Pi (pi_id);

CREATE TABLE Pirl
    (
      `pirl_id` int,
      `pirl_pidl_id` int,
      `pirl_status` int,
      `pirl_time` int,
   PRIMARY KEY (`pirl_id`)
    )
;

alter table Pirl
  add constraint fk2_branchNo foreign key (pirl_pidl_id) references Pidl (pidl_id);

INSERT INTO Pi
  (`pi_id`)
  VALUES
  (3),
  (4),
  (5);

INSERT INTO Pidl
    (`pidl_id`, `pidl_pi_id`,`pidl_status`,`pidl_time`, `pidl_hash`)
VALUES
    (1, 3, 1,100, 'hashA'),
    (2, 3, 1,150,'hashB'),
    (3, 4, 2, 200,'hashC'),
    (4, 3, 1, 200,'hashA')
;

INSERT INTO Pirl
    (`pirl_id`, `pirl_pidl_id`,`pirl_status`,`pirl_time`)
VALUES
    (1, 2, 0,100),
    (2, 3, 1,150),
    (3, 1, 2, 200)
;

Of course with 3 records it's fast. But with around 10-30k it takes more than 5 seconds. What I've found is that the thing that makes it slow is the last part of the where:

AND (
    pirl1A.sub_pirl_id IS NULL
    OR (
        pidl1A.pidl_hash !=  pirl1A.sub_pidl_hash
    )
    OR (
        pirl1A.sub_pirl_status != 1
    )
)

The other strange thing that I've found is that by using DISTINCT, the query got a bit faster but not fast enough.

Gordon Linoff · Accepted Answer · 2016-05-07 15:18:35Z

1

When I read your requirements, I come up with a query like this:

select pidl.*
from pidl left join
     (select image, max(pidl_time) as pidl_time
      from pidl
      group by image
     ) maxpidl
     on pidl.image = maxpidl.image and pidl.pidl_time = maxpidl.pidl_time
     pirl
     on pidl.hash = pirl.hash
where pirl.hash is null;

I think you have some other conditions that are not fully explained (such as the role of status). You should be able to incorporate that.

In MySQL, you should avoid subqueries in the from clause. These are materialized and -- as a result -- there is additional overhead for that work and the engine cannot subsequently use indexes.

answered May 7, 2016 at 15:18

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

M. Ivanov Over a year ago

I don't understand your query at all. There is no direct hash in pirl too. As for the materialization, I don't understand what's so hard in looping the 6k results to check some boolean conditions even if no indexes exist. Also I don't think this works: select image, max(pidl_time) as pidl_time from pidl group by image Since the row returned doesn't have to be the one where the max was encountered.

gr1zzly be4r · Accepted Answer · 2016-05-07 15:20:29Z

0

Your queries aren't using your indexes, and are instead using views in a subquery. This can be very slow. I would suggest making new tables that are indexed with the information that you need or a materialized view.

answered May 7, 2016 at 15:20

gr1zzly be4r

2,1421 gold badge19 silver badges34 bronze badges

3 Comments

M. Ivanov Over a year ago

New tables, but what should they contain? And can't I just fix the query to work? A materialized view is just cache and I need to execute the query, not cache it's result.

gr1zzly be4r Over a year ago

Probably information that indexes what you join on here

pirl3B.sub_pi_id = pidl1B.pidl_pi_id and          (pirl3B.sub_pirl_time > pirl1B.pirl_time or             (pirl3B.sub_pirl_time = pirl1B.pirl_time and             pirl3B.sub_pirl_id > pirl1B.pirl_id)         )

M. Ivanov Over a year ago

So basically MySQL needs like 0.5 secs just to loop an array of 6000 rows which have like 3 integer properties and do a simple comparison/boolean checks on them?

Collectives™ on Stack Overflow

MySQL Slow query with multiple joins and subqueries

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related