How to optimize this SELECT with sub query Oracle

Question

Here is my query,

   SELECT ID As Col1,
   (
         SELECT VID FROM TABLE2 t 
         WHERE (a.ID=t.ID or a.ID=t.ID2) 
         AND t.STARTDTE =
         (
               SELECT MAX(tt.STARTDTE) 
               FROM TABLE2 tt 
               WHERE (a.ID=tt.ID or a.ID=tt.ID2)  AND tt.STARTDTE < SYSDATE
         )
   ) As Col2
   FROM TABLE1 a

Table1 has 48850 records and Table2 has 15944098 records.

I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.

The query is still too slow. How can this be improved? Please help.

Please show us the execution plan.

user330315
– user330315

2012-09-25 14:56:54 +00:00
Commented Sep 25, 2012 at 14:56 — user330315
– user330315, Commented Sep 25, 2012 at 14:56

Vincent Malgrat · Accepted Answer · 2012-09-25 15:16:17Z

3

I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.

This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):

CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE) 
   RETURN table2.vid%TYPE IS
   l_result table2.vid%TYPE;
BEGIN
   SELECT vid
     INTO l_result
     FROM (SELECT vid, startdte
             FROM (SELECT vid, startdte
                     FROM table2 t
                    WHERE t.id = p_id
                      AND t.startdte < SYSDATE
                    ORDER BY t.startdte DESC)
            WHERE rownum = 1
           UNION ALL
           SELECT vid, startdte
             FROM (SELECT vid, startdte
                     FROM table2 t
                    WHERE t.id2 = p_id
                      AND t.startdte < SYSDATE
                    ORDER BY t.startdte DESC)
            WHERE rownum = 1
            ORDER BY startdte DESC)
    WHERE rownum = 1;
   RETURN l_result;
END;

Your SQL would become:

SELECT ID As Col1,
       getvid(a.id) vid
  FROM TABLE1 a

Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.

edited Sep 25, 2012 at 15:16

answered Sep 25, 2012 at 14:55

Vincent Malgrat

67.9k9 gold badges122 silver badges176 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ozzboy Over a year ago

Vincent would creating a view with using the your logic (UNION ALL) and then query the view help? Which would be better using the function or a view directly?

Vincent Malgrat Over a year ago

@ram I don't think using a view in this case is possible because you could not push the predicate WHERE t.id = a.id deep enough to trigger the very efficient FULL INDEX SCAN (MIN/MAX). In fact I suggested creating a function because you couldn't inline this SQL in your query.

Alperen Üretmen Over a year ago

Do you have any documentation about this issue(or with inner queries)?

John D · Accepted Answer · 2012-09-25 14:43:41Z

1

Possibly try the following, though untested.

WITH max_times AS
    (SELECT a.ID, MAX(t.STARTDTE) AS Startdte
     FROM TABLE1 a, TABLE2 t 
     WHERE (a.ID=t.ID OR a.ID=t.ID2) 
         AND t.STARTDTE < SYSDATE
     GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
    LEFT OUTER JOIN max_times mt
    ON (b.ID = mt.ID)
    LEFT OUTER JOIN TABLE2 tt
    ON ((mt.ID=tt.ID OR mt.ID=tt.ID2) 
        AND mt.startdte = tt.startdte)

answered Sep 25, 2012 at 14:43

John D

2,37517 silver badges29 bronze badges

Comments

Alex Poole · Accepted Answer · 2012-09-25 14:59:45Z

1

You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:

SELECT id AS col1, vid
FROM (
    SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
        CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
        NULLS LAST) AS rn
    FROM table1 t1
    JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;

The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.

I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.

The outer query is then just picking the top-ranked result for each id.

You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.

But this is largely speculation without being able to see where your existing query is actually slow.

answered Sep 25, 2012 at 14:59

Alex Poole

192k11 gold badges198 silver badges349 bronze badges

5 Comments

Vincent Malgrat Over a year ago

This will produce a full scan on TABLE2 which given its size may take a while. It still might be efficient if TABLE2 is skinny (not many columns).

Alex Poole Over a year ago

@VincentMalgrat - True, unless the outer query was filtered, which the question does not imply. Would be nice if the optimiser could infer a stopkey from the rn check and still use the index on id, startdte efficiently, but I don't think that is likely (or reliable). And your point about index ordering is a good one.

Vincent Malgrat Over a year ago

Well, in most cases the intriguing join condition would probably baffle the optimizer. I'm not sure an outer WHERE condition could be pushed and filter efficiently TABLE2 through a wise index scan :)

Alex Poole Over a year ago

It would be nice, but maybe not realistic *8-) Still, another option to consider and benchmark, though I fully expect yours to be better by avoiding the full scan altogether.

Vincent Malgrat Over a year ago

I'm pretty sure we could engineer a case where the FULL SCAN comes on top (15M rows can be very efficiently packed if the rows are small). Even if TABLE2 is huge, we could index (id, id2, startdte, vid) and access it in a FAST FULL INDEX SCAN with your solution.

Collectives™ on Stack Overflow

How to optimize this SELECT with sub query Oracle

3 Answers 3

3 Comments

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related