Oracle subquery performance

Question

so I have this huge table SS(someID, someDate, ...). I need to join a subset of this table to the other table. The subset is determined by: select * from SS where someID in (select someID from SS where someDate is between date1 and date2).

When running this on Oracle XA data server in parallel, the execution takes a long time and TEMP space, even though Oracle can cell offloading efficiency of 99% on the SS table, but the subset query still bring back a large amount of data to the database server in joining with other table.

Is there anyway to make this more efficient? such as Oracle doesn't have to send back as much data and utilize more of the cell offloading efficiency?

Below is the query plan

PLAN_TABLE_OUTPUT
Plan hash value: 3198983388

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name           | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                |  1044K|   589M| 46101   (1)| 00:01:33 |        |      |            |
|   1 |  PX COORDINATOR                         |                |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                   | :TQ10003       |  1044K|   589M| 46101   (1)| 00:01:33 |  Q1,03 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED                   |                |  1044K|   589M| 46101   (1)| 00:01:33 |  Q1,03 | PCWP |            |
|   4 |     PX RECEIVE                          |                |       |       |            |          |  Q1,03 | PCWP |            |
|   5 |      PX SEND HASH                       | :TQ10001       |       |       |            |          |  Q1,01 | P->P | HASH       |
|   6 |       NESTED LOOPS                      |                |       |       |            |          |  Q1,01 | PCWP |            |
|   7 |        NESTED LOOPS                     |                |   523K|   135M| 38264   (1)| 00:01:17 |  Q1,01 | PCWP |            |
|   8 |         SORT UNIQUE                     |                | 29402 |   401K| 13751   (1)| 00:00:28 |  Q1,01 | PCWP |            |
|   9 |          PX RECEIVE                     |                | 29402 |   401K| 13751   (1)| 00:00:28 |  Q1,01 | PCWP |            |
|  10 |           PX SEND HASH                  | :TQ10000       | 29402 |   401K| 13751   (1)| 00:00:28 |  Q1,00 | P->P | HASH       |
|  11 |            PX BLOCK ITERATOR            |                | 29402 |   401K| 13751   (1)| 00:00:28 |  Q1,00 | PCWC |            |
|* 12 |             INDEX STORAGE FAST FULL SCAN| SUPERSET_IDX1  | 29402 |   401K| 13751   (1)| 00:00:28 |  Q1,00 | PCWP |            |
|* 13 |         INDEX RANGE SCAN                | XU_SUPERSET_01 |    18 |       |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|  14 |        TABLE ACCESS BY INDEX ROWID      | SUPERSET       |    18 |  4644 |     2   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|  15 |     PX RECEIVE                          |                |  2886K|   880M|  7834   (2)| 00:00:16 |  Q1,03 | PCWP |            |
|  16 |      PX SEND HASH                       | :TQ10002       |  2886K|   880M|  7834   (2)| 00:00:16 |  Q1,02 | P->P | HASH       |
|  17 |       PX BLOCK ITERATOR                 |                |  2886K|   880M|  7834   (2)| 00:00:16 |  Q1,02 | PCWC |            |
|  18 |        TABLE ACCESS STORAGE FULL        | POL_DTL        |  2886K|   880M|  7834   (2)| 00:00:16 |  Q1,02 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(SS.POL_ID=PD.POL_ID)
  12 - storage(IMPT_DT<=TO_DATE(' 2014-11-20 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND IMPT_DT>=TO_DATE(' 2014-10-28 
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter(IMPT_DT<=TO_DATE(' 2014-11-20 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND IMPT_DT>=TO_DATE(' 2014-10-28 
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  13 - access(SS.POL_ID=POL_ID)

Note
-----
   - Degree of Parallelism is 4 because of session

enter image description here

select * from SS where someID in (select someID from SS where someDate is between date1 and date2). Could this be really called a self join? Or it is a correlated sub-query? I don't know if this change something for the optimizer though... Anyway, without the query plan, it is hard to answer such a question. — Sylvain Leroux
– Sylvain Leroux, Commented Nov 19, 2014 at 16:40
i think select * from SS where someDate is between date1 and date2 returns same result that your query, so i don't understand the point very well — Aramillo
– Aramillo, Commented Nov 19, 2014 at 16:50
I've added the execution stat and plan. As you can see, lots of activity... — timpham
– timpham, Commented Nov 19, 2014 at 16:54
Please run a command: EXPLAIN PLAN FOR your_query_goes_here ; and then run SELECT * FROM table(DBMS_XPLAN.DISPLAY), and finally please copy a result of the last query to the clipboard (in a TEXT format), and paste it to the answer. The image is pretty, but hard to read. Thank you. — krokodilko
– krokodilko, Commented Nov 19, 2014 at 18:12
i'm not sure how to preserve the table-format from the plan. everytime i copy/paste to here it loose all the format :( If you right click on the image, and Open in a new tab, it should display pretty clear. I took this snapshot from Oracle Enterprise Manager — timpham
– timpham, Commented Nov 19, 2014 at 18:56

Jon Heller · Accepted Answer · 2014-11-22 17:34:46Z

There may not be much you can do to improve this query. The execution plan looks pretty good:

Good objects The indexes seem to fit the query well, although it's hard to tell without the full definitions.
Good cardinality The estimated rows and actual rows are close. This strongly implies the optimizer is doing a good job and is picking a near-optimal plan. If it can estimate the number of rows correctly it will make wise decisions about the access paths, join methods, join order, etc. Even the time estimate is close, which is rare. It looks like there are good table and system statistics.
Cell offloading The storage predicates and the active report Cell offloading imply that cell offloading is working as expected, at least once.
Parallelism Large objects are already being processed in parallel. I don't see any obvious parallel problems.

Here are some ideas for improvement but don't expect drastic improvements:

Full table scan Force a full table scan instead of an index range scan with a hint like --+ no_index(superset XU_SUPERSET_01). With multiblock reads (use for full scans) and cell offloading (used for a direct path read for a full scan not used for an index range scan that uses the buffer cache), a full table scan reading all the data may be more efficient than an index range scan reading less data.
Covering index If the full table scan doesn't work, create a skinny version of the table with an index that includes all returned and queried columns. This gets the benefits of full scans (multiblock IO, cell offloading) but is smaller than the full table.
Larger DOP There's no magic number for the degree of parallelism (DOP). But in my experience the DOP sweet-spot is almost always larger than 4. This may improve performance but will use more resources.
Rewrite query? Re-writing the query may enable smart scan to process the join in the storage cells. Try changing
```
select * from SS where someID in 
  (select someID from SS where someDate is between date1 and date2)
```
to
```
select distinct ss1.*
from ss ss1
join ss ss2
    on ss1.someID = ss2.someID
     and ss2.someDate is between date1 and date2
```
This new version does extra work. The join returns more rows than necessary, and then they need to be made distinct. That extra work may be worth it if it means the join can happen in the storage cells. I can't find a great source for exactly what kind of processing can be offloaded, but at least some types of joins can be.

Collectives™ on Stack Overflow

Oracle subquery performance

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related