0

I have 3 tables:

  • Pi - images
  • Pidl - images dl log => Pidl
  • Pirl - images resize log => Pidl

Basically an image is downloaded and a log record is created in Pidl. After that, it's resized and a record is created in Pirl. Said record being connected to the Pidl record.

I am writing a query as to find which images need to be resized and it basically queries Pidl. The algo I've devised is simple:

for each Image in Pi {
    pidlA=newest_pidl(Image);
    if(pidlA.status == success) {
        pirlA=newest_pirl(Image);
        if(pirlA.pidl.hash != pidlA.hash)
        {
            go;
        }
        else if(pirlA.status != success){
            failed_attempts = failed_pirl_count(pirlA,newest_succesful_pirl(Image))
            decide based on pirlA.time and failed_attempts if go or not
        }
        else
        {
            dont go;
        }
    }
    else
    {
        dont go;
    }
}

And now my query(altough is not yet finished, the failed attempts part is missing, but it's already too slow, so first I'd like to fix that).

SELECT 
pidl1A.pidl_id

FROM Pidl as pidl1A

LEFT JOIN Pidl as pidl2A
ON (
    pidl1A.pidl_pi_id = pidl2A.pidl_pi_id AND 
    pidl2A.pidl_status = 1 AND
    (pidl2A.pidl_time > pidl1A.pidl_time OR 
        (pidl2A.pidl_id > pidl1A.pidl_id and pidl1A.pidl_time=pidl2A.pidl_time)
    )
) 

LEFT JOIN (
    #newest pirl subquery#
    SELECT 
    pidl1B.pidl_pi_id as sub_pi_id, 
    pidl1B.pidl_hash as sub_pidl_hash,
    pirl1B.pirl_id as sub_pirl_id,
    pirl1B.pirl_status as sub_pirl_status
    FROM Pirl as pirl1B 

    INNER JOIN Pidl as pidl1B ON (pirl1B.pirl_pidl_id = pidl1B.pidl_id)

    LEFT JOIN (
        SELECT
        pidl2B.pidl_pi_id as sub_pi_id,
        pirl2B.pirl_id as sub_pirl_id,
        pirl2B.pirl_time as sub_pirl_time
        FROM Pirl as pirl2B 
        INNER JOIN Pidl as pidl2B ON (pirl2B.pirl_pidl_id = pidl2B.pidl_id)
        WHERE 1
    ) as pirl3B 
    ON (
        pirl3B.sub_pi_id = pidl1B.pidl_pi_id and 
        (pirl3B.sub_pirl_time > pirl1B.pirl_time or
            (pirl3B.sub_pirl_time = pirl1B.pirl_time and
            pirl3B.sub_pirl_id > pirl1B.pirl_id)
        )
    )

    WHERE 
    pirl3B.sub_pirl_id is null
) as pirl1A
ON (pirl1A.sub_pi_id = pidl1A.pidl_pi_id)

WHERE 
pidl1A.pidl_status = 1 AND pidl2A.pidl_id IS NULL
AND (
    pirl1A.sub_pirl_id IS NULL
    OR (
        pidl1A.pidl_hash !=  pirl1A.sub_pidl_hash
    )
    OR (
        pirl1A.sub_pirl_status != 1
    )
)

And this is my db schema:

CREATE TABLE Pi (
  `pi_id` int,
   PRIMARY KEY (`pi_id`)
  )
;

CREATE TABLE Pidl
    (
      `pidl_id` int,
      `pidl_pi_id` int,
      `pidl_status` int,
      `pidl_time` int,
     `pidl_hash` varchar(16),
   PRIMARY KEY (`pidl_id`)
    )
;

alter table Pidl
  add constraint fk1_branchNo foreign key (pidl_pi_id) references Pi (pi_id);

CREATE TABLE Pirl
    (
      `pirl_id` int,
      `pirl_pidl_id` int,
      `pirl_status` int,
      `pirl_time` int,
   PRIMARY KEY (`pirl_id`)
    )
;

alter table Pirl
  add constraint fk2_branchNo foreign key (pirl_pidl_id) references Pidl (pidl_id);

INSERT INTO Pi
  (`pi_id`)
  VALUES
  (3),
  (4),
  (5);

INSERT INTO Pidl
    (`pidl_id`, `pidl_pi_id`,`pidl_status`,`pidl_time`, `pidl_hash`)
VALUES
    (1, 3, 1,100, 'hashA'),
    (2, 3, 1,150,'hashB'),
    (3, 4, 2, 200,'hashC'),
    (4, 3, 1, 200,'hashA')
;

INSERT INTO Pirl
    (`pirl_id`, `pirl_pidl_id`,`pirl_status`,`pirl_time`)
VALUES
    (1, 2, 0,100),
    (2, 3, 1,150),
    (3, 1, 2, 200)
;

Of course with 3 records it's fast. But with around 10-30k it takes more than 5 seconds. What I've found is that the thing that makes it slow is the last part of the where:

AND (
    pirl1A.sub_pirl_id IS NULL
    OR (
        pidl1A.pidl_hash !=  pirl1A.sub_pidl_hash
    )
    OR (
        pirl1A.sub_pirl_status != 1
    )
)

The other strange thing that I've found is that by using DISTINCT, the query got a bit faster but not fast enough.

2 Answers 2

1

When I read your requirements, I come up with a query like this:

select pidl.*
from pidl left join
     (select image, max(pidl_time) as pidl_time
      from pidl
      group by image
     ) maxpidl
     on pidl.image = maxpidl.image and pidl.pidl_time = maxpidl.pidl_time
     pirl
     on pidl.hash = pirl.hash
where pirl.hash is null;

I think you have some other conditions that are not fully explained (such as the role of status). You should be able to incorporate that.

In MySQL, you should avoid subqueries in the from clause. These are materialized and -- as a result -- there is additional overhead for that work and the engine cannot subsequently use indexes.

Sign up to request clarification or add additional context in comments.

1 Comment

I don't understand your query at all. There is no direct hash in pirl too. As for the materialization, I don't understand what's so hard in looping the 6k results to check some boolean conditions even if no indexes exist. Also I don't think this works: select image, max(pidl_time) as pidl_time from pidl group by image Since the row returned doesn't have to be the one where the max was encountered.
0

Your queries aren't using your indexes, and are instead using views in a subquery. This can be very slow. I would suggest making new tables that are indexed with the information that you need or a materialized view.

3 Comments

New tables, but what should they contain? And can't I just fix the query to work? A materialized view is just cache and I need to execute the query, not cache it's result.
Probably information that indexes what you join on here pirl3B.sub_pi_id = pidl1B.pidl_pi_id and (pirl3B.sub_pirl_time > pirl1B.pirl_time or (pirl3B.sub_pirl_time = pirl1B.pirl_time and pirl3B.sub_pirl_id > pirl1B.pirl_id) )
So basically MySQL needs like 0.5 secs just to loop an array of 6000 rows which have like 3 integer properties and do a simple comparison/boolean checks on them?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.