1

I have a table which is having 3 columns-PID,LOCID,ISMGR. Now in existing scenario, for some person, based on the location ID, he is set as ISMGR=true. But as per the new requirement, we have to make all the ISMGR=true for any person who is having at least one ISMGR=true(means if he is mangager for any one location, he should be manager for all the locations).

Table Data before running the script:

PID|LOCID|ISMGR
1    1     1
1    2     0
1    3     0
2    1     0
2    2     1 

Table Data after running the script:

PID|LOCID|ISMGR
1    1     1
1    2     1
1    3     1
2    1     1
2    2     1

Any help will be highly appreciated..

Thanks in advance.

1
  • Did you actually mean PL/SQL (Oracle's procedural language), or simply Oracle SQL? And I don't actually see any question. Commented Oct 24, 2016 at 1:52

4 Answers 4

2

I would be inclined to write this using exists:

update t 
    set ismgr = 1
    where ismgr = 0 and
          exists (select 1 from t t2 where t2.pid = t.pid and t2.ismgr = 1);

exists should be more efficient than doing a subquery with an aggregation.

This will work best with indexes on t(pid, ismgr) and t(ismgr).

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks Gordon!Bingo!
Is it? It seems to me the base table would be read again and again in the correlated subquery for each row in the outer query. The aggregated subquery is evaluated only once. I may do some testing if I find a few minutes, it's an interesting question.
@mathguy . . . Not at all. The exists should be a simple index lookup. The outer query might or might not use an index to find the rows that match the first condition.
Good point about the index, I will try it both ways, with or without an index. Of course, an index will help the grouping, too.
Just to be clear, your answer is perfectly fine, I am not challenging it (or your observation about performance), it's just that in the past I heard many such pronouncements that turned out not to be true. I posted a lot of performance comparisons like this on OTN, and a few right here on SO. I will post results of my tests here, with full scripts so they can be repeated. Even if the exists solution is faster, it will be interesting to see how much faster.
|
2

This is not an answer but a test of the two solutions offered so far - I will call them the "EXISTS" and the "AGGREGATE" solutions or approaches.

Details of the tests are below, but here are two overall conclusions:

  1. Both approaches have comparable execution times; on average the AGGREGATE approach worked a little faster than the EXISTS approach, but by a very small margin (smaller than the differences between running times from one trial to the next). Without indexes on any columns, the run times were: (first number is for the EXISTS approach and the second for AGGREGATE). Trial 1: 8.19s 8.08s Trial 2: 8.98s 8.22s Trial 3: 9.46s 9.55s Note - Estimated optimizer costs should be used only to compare different execution plans for the same statement, not for different solutions using different approaches. Even so, someone will inevitably ask; so - for the EXISTS approach the lowest cost the Optimizer found was 4766; for AGGREGATE, 2665. Again, though, this is completely meaningless.

  2. If a lot of rows need to be updated, indexes will hurt performance much more than they help it. Indeed, when rows are updated, the indexes must be updated as well. If only a small number of rows must be updated, then the indexes will help, because most of the time is spent finding the rows that must be updated and only little time is spent in the updates themselves. In my example almost 25% of rows had to be updated... so the AGGREGATE solution took 51.2 seconds and the EXISTS solution took 59.3 seconds! RECOMMENDATION: If you expect that a large number of rows may need to be updated, and you already have indexes on the table, you may be better off DROPPING them and re-creating them after the updates! Or, perhaps there are other solutions to this problem; I am not an expert (keep that in mind!)

To test properly, after I created the test table and committed, I ran each solution by itself, then I rolled back and, logged in as SYS (in a different session), I ran alter system flush buffer_cache to make sure performance is not randomly helped by cache hits or hurt by misses. In all cases everything is done from disk storage.

I created a table with id's from 1 to 1.2 million and a random integer between 1 and 3, with probabilities 40%, 40% and 20% respectively (see the use of dbms_random below). Then from this prep data I created the test table: each pid was included one, two or three times based on this random integer; and a random 0 or 1 was added as ismgr (with 50-50 probability) in each row. I also added a random integer between 1 and 4 as locid just to simulate the actual data; I didn't worry about duplicate locid since that column plays no role in the problem.

Of the 1.2 million pids, approximately 480,000 (40%) appear just once in the test table, another ~480,000 appear twice and ~240,000 three times. Total rows should be about 2,160,000. That's the cardinality of the base table (in reality it ended up being 2,160,546). Then: none of the ~480,000 rows with unique pid need to be changed; half of the 480,000 pids with a count of 2 will have the same ismgr (so no change) and the other half will be split, so we will need to change 240,000 rows from these; and a simple combinatorial argument shows that 3/8, or 270,000 rows, of the 720,000 rows for pids that appear three times in the table must be changed. So we should expect that 510,000 rows should be changed. In fact the update statements resulted in 510,132 rows updated (same for both solutions). These sanity checks show that the test was probably set up correctly. Below I show also a small sample from the base table, also as a sanity check.

CREATE TABLE statement:

create table tbl as
  with prep ( pid, dup ) as (
          select level,
                 round( dbms_random.value(0.5, 3) ) as dup
          from   dual
          connect by level <= 1200000
       )
  select pid,
         round( dbms_random.value(0.5, 4.5) ) as locid,
         round( dbms_random.value(0, 1) )     as ismgr
  from   prep
  connect by level <= dup
      and prior pid = pid
      and prior sys_guid() is not null
;

commit;

Sanity checks:

select count(*) from tbl;

  COUNT(*)
----------
   2160546

select * from tbl where pid between 324720 and 324730;

       PID      LOCID      ISMGR
---------- ---------- ----------
    324720          4          1
    324721          1          0
    324721          4          1
    324722          3          0
    324723          1          0
    324723          3          0
    324723          3          1
    324724          3          1
    324724          2          0
    324725          4          1
    324725          2          0
    324726          2          0
    324726          1          0
    324727          3          0
    324728          4          1
    324729          1          0
    324730          3          1
    324730          3          1
    324730          2          0

 19 rows selected 

UPDATE statements:

update tbl t
    set ismgr = 1
    where ismgr = 0 and
          exists (select 1 from tbl t2 where t2.pid = t.pid and t2.ismgr = 1);

rollback;

update tbl
set    ismgr = 1
where  ismgr = 0
  and  pid in ( select   pid
                from     tbl 
                group by pid 
                having   max(ismgr) = 1);

rollback;

-- statements to create indexes, used in separate testing:
create index pid_ismgr_idx on tbl(pid, ismgr);
create index ismgr_ids on tbl(ismgr);

1 Comment

You are pretty thorough. I would be surprised if even 25% of the rows need to be updated . . . actually, once there is an update per data page, the use of indexes probably doesn't help much. I wonder if the timings you mention are simply the extra time to load the indexes as well as the data pages.
1

Why PL/SQL? All you need is a plain SQL statement. For example:

update your_table t  -- enter your actual table name here
set    ismgr = 1
where  ismgr = 0
  and  pid in ( select   pid
                from     your_table 
                group by pid 
                having   max(ismgr) = 1)
;

2 Comments

Thanks Mathguy.Looks like Gordon's response is little better in perf, hence accepting his.Thank You anyway!
@user2948533 - no worries, Gordon gave you a perfectly fine answer.
0

The existing solutions are perfectly fine, but I prefer to use merge any time I'm updating rows from a correlated sub-query. I find it to be more readable and the performance is typically commensurate with the exists method.

MERGE INTO t
USING      (SELECT DISTINCT pid
            FROM   t
            WHERE  ismgr = 1) src
ON         (t.pid = src.pid)
WHEN MATCHED THEN
   UPDATE SET ismgr = 1
      WHERE      ismgr = 0;

As @mathguy pointed out, in this case using group by and having is more efficient than distinct. To use that with merge is just a matter of changing the sub-query:

MERGE INTO t
USING      (SELECT   pid
            FROM     t
            GROUP BY pid
            HAVING   MAX(ismgr) = 1) src
ON         (t.pid = src.pid)
WHEN MATCHED THEN
   UPDATE SET ismgr = 1
      WHERE      ismgr = 0;

3 Comments

Generally I prefer MERGE too. In this case, you need to SELECT DISTINCT or else MERGE won't work, and that adds overhead. I just tested against the same table and I get slightly higher execution times (9.27s, 10.73s, 9.62s) but it's a different time of day, who knows what processes are running in the background on my machine. Also I am not sure if a subquery with GROUP BY and HAVING would shave a few fractions of a second off the execution time compared to SELECT DISTINCT - not sure which is more efficient (it probably depends on may things).
@mathguy: A little experimentation shows that the group by/having formulation is a bit faster, but that can be easily added to the merge. In fact, the update you provided and the merge using group by/having have effectively the same plan.
Right, that's what I meant, in both respects. Cheers!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.