2

Here is my query,

   SELECT ID As Col1,
   (
         SELECT VID FROM TABLE2 t 
         WHERE (a.ID=t.ID or a.ID=t.ID2) 
         AND t.STARTDTE =
         (
               SELECT MAX(tt.STARTDTE) 
               FROM TABLE2 tt 
               WHERE (a.ID=tt.ID or a.ID=tt.ID2)  AND tt.STARTDTE < SYSDATE
         )
   ) As Col2
   FROM TABLE1 a

Table1 has 48850 records and Table2 has 15944098 records.

I have separate indexes in TABLE2 on ID,ID & STARTDTE, STARTDTE, ID, ID2 & STARTDTE.

The query is still too slow. How can this be improved? Please help.

1
  • 1
    Please show us the execution plan. Commented Sep 25, 2012 at 14:56

3 Answers 3

3

I'm guessing that the OR in inner queries is messing up with the optimizer's ability to use indexes. Also I wouldn't recommend a solution that would scan all of TABLE2 given its size.

This is why in this case I would suggest using a function that will efficiently retrieve the information you are looking for (2 index scan per call):

CREATE OR REPLACE FUNCTION getvid(p_id table1.id%TYPE) 
   RETURN table2.vid%TYPE IS
   l_result table2.vid%TYPE;
BEGIN
   SELECT vid
     INTO l_result
     FROM (SELECT vid, startdte
             FROM (SELECT vid, startdte
                     FROM table2 t
                    WHERE t.id = p_id
                      AND t.startdte < SYSDATE
                    ORDER BY t.startdte DESC)
            WHERE rownum = 1
           UNION ALL
           SELECT vid, startdte
             FROM (SELECT vid, startdte
                     FROM table2 t
                    WHERE t.id2 = p_id
                      AND t.startdte < SYSDATE
                    ORDER BY t.startdte DESC)
            WHERE rownum = 1
            ORDER BY startdte DESC)
    WHERE rownum = 1;
   RETURN l_result;
END;

Your SQL would become:

SELECT ID As Col1,
       getvid(a.id) vid
  FROM TABLE1 a

Make sure you have indexes on both table2(id, startdte DESC) and table2(id2, startdte DESC). The order of the index is very important.

Sign up to request clarification or add additional context in comments.

3 Comments

Vincent would creating a view with using the your logic (UNION ALL) and then query the view help? Which would be better using the function or a view directly?
@ram I don't think using a view in this case is possible because you could not push the predicate WHERE t.id = a.id deep enough to trigger the very efficient FULL INDEX SCAN (MIN/MAX). In fact I suggested creating a function because you couldn't inline this SQL in your query.
Do you have any documentation about this issue(or with inner queries)?
1

Possibly try the following, though untested.

WITH max_times AS
    (SELECT a.ID, MAX(t.STARTDTE) AS Startdte
     FROM TABLE1 a, TABLE2 t 
     WHERE (a.ID=t.ID OR a.ID=t.ID2) 
         AND t.STARTDTE < SYSDATE
     GROUP BY a.ID)
SELECT b.ID As Col1, tt.VID
FROM TABLE1 b
    LEFT OUTER JOIN max_times mt
    ON (b.ID = mt.ID)
    LEFT OUTER JOIN TABLE2 tt
    ON ((mt.ID=tt.ID OR mt.ID=tt.ID2) 
        AND mt.startdte = tt.startdte)

Comments

1

You can look at analytic functions to avoid having to hit the second table twice. Something like this might work:

SELECT id AS col1, vid
FROM (
    SELECT t1.id, t2.vid, RANK() OVER (PARTITION BY t1.id ORDER BY
        CASE WHEN t2.startdte < TRUNC(SYSDATE) THEN t2.startdte ELSE null END
        NULLS LAST) AS rn
    FROM table1 t1
    JOIN table2 t2 ON t2.id IN (t1.ID, t1.ID2)
)
WHERE rn = 1;

The inner select gets the id and vid values from the two tables with a simple join on id or id2. The rank function calculates a ranking for each matching row in the second table based on the startdte. It's complicated a bit by you wanting to filter on that date, so I've used a case to effectively ignore any dates today or later by changing the evaluated value to null, and in this instance that means the order by in the over clause needs nulls last so they're ignored.

I'd suggest you run the inner select on its own first - maybe with just a couple of id values for brevity - to see what its doing, and what ranks are being allocated.

The outer query is then just picking the top-ranked result for each id.

You may still get duplicates though; if table2 has more than one row for an id with the same startdte they'll get the same rank, but then you may have had that situation before. You may need to add more fields to the order by to break ties in a way that makes sens to you.

But this is largely speculation without being able to see where your existing query is actually slow.

5 Comments

This will produce a full scan on TABLE2 which given its size may take a while. It still might be efficient if TABLE2 is skinny (not many columns).
@VincentMalgrat - True, unless the outer query was filtered, which the question does not imply. Would be nice if the optimiser could infer a stopkey from the rn check and still use the index on id, startdte efficiently, but I don't think that is likely (or reliable). And your point about index ordering is a good one.
Well, in most cases the intriguing join condition would probably baffle the optimizer. I'm not sure an outer WHERE condition could be pushed and filter efficiently TABLE2 through a wise index scan :)
It would be nice, but maybe not realistic *8-) Still, another option to consider and benchmark, though I fully expect yours to be better by avoiding the full scan altogether.
I'm pretty sure we could engineer a case where the FULL SCAN comes on top (15M rows can be very efficiently packed if the rows are small). Even if TABLE2 is huge, we could index (id, id2, startdte, vid) and access it in a FAST FULL INDEX SCAN with your solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.