3

I am trying to "shuffle" last_names values in the names table. I am wanting to use a sub-select query that randomizes the order of the names and updates them accordingly. I wanting to do this for obfuscation reasons but want it to still look like a real data set.

The statement below returns a "ORA-01427: single-row subquery returns more than one row"

How can I make this work?

UPDATE schema.names set last_name = (
      SELECT * 
        FROM (
         SELECT last_name
         FROM schema.names
         ORDER BY DBMS_RANDOM.RANDOM))
2
  • 1
    Add where rownum = 1 to the outer SELECT. Commented Apr 3, 2018 at 15:58
  • @GordonLinoff this will set all lastnames to one random lastname though right? I want all the names to be shuffled around but unique. Commented Apr 3, 2018 at 16:02

2 Answers 2

2

Here is a query that shuffles the names:

select n.*, n2.name as new_name
from (select n.*, row_number() over (order by dbms_random.random) as seqnum
      from schema.names n
     ) n join
     (select n.*, row_number() over (order by dbms_random.random) as seqnum
      from schema.names n
     ) n2 
     on n.seqnum = n2.seqnum;

You can incorporate this into a merge, assuming you have a primary key:

merge into schema.names n
    using (select n.*, n2.name as new_name
           from (select n.*, row_number() over (order by dbms_random.random) as seqnum
                 from schema.names n
                ) n join
                (select n.*, row_number() over (order by dbms_random.random) as seqnum
                 from schema.names n
                ) n2 
                on n.seqnum = n2.seqnum
          ) nn
    on n.? = nn.?
when matched then update
    set n.name = nn.new_name;
Sign up to request clarification or add additional context in comments.

1 Comment

Not working for the query that shuffles names, "ORA-24804: Lob read/write functions called while another OCI LOB read/write streaming is in progress" Also needs to be DBMS_RANDOM.RANDOM instead of dbs.random.
0

Edit: the following PL/SQL block uses the query from Gordon's answer to drive a loop.

Setup:

create table demo
( name_original varchar2(10)
, name_new      varchar2(10) );

-- Two columns initially the same so we can check the results:
insert into demo
select column_value, column_value
from table(sys.dbms_debug_vc2coll('Jim','James','Joe','Jenny','Jane','Jacky'));

Code:

begin
    for r in (
        select n.rowid as rwd
             , n.name_original
             , n2.name_new as name_shuffled
        from   ( select n.*, row_number() over(order by dbms_random.value) as seqnum
                 from   demo n ) n
               join  
               ( select n.*, row_number() over(order by dbms_random.value) as seqnum
                 from   demo n ) n2
               on n.seqnum = n2.seqnum
    )
    loop
        update demo set name_new = r.name_shuffled
        where  rowid = r.rwd;
    end loop;
end;

My original answer is below, not especially helpful to the OP but I thought it was technically interesting anyway:


The optimiser realises that an uncorrelated single-row subquery only needs to be executed once, so you get one value looked up and applied to all rows. Possibly there is a hint that prevents this but I couldn't find one from a quick look (no_merge, no_unnest and rule had no effect).

The following worked (Oracle 12.1) but only by imposing an additional rule that old and new names had to be different, making it a correlated subquery which the optimiser has to evaluate for each row. (It also generates duplicates, because each random lookup is independent, so it may be no use to you.)

update demo d set name_new =
       ( select name_new
         from   demo d2
         where  d2.name_new <> d.name_new
         order by dbms_random.random
         fetch first row only );

select * from demo;

NAME_ORIGINAL NAME_NEW
------------- ----------
Jim           Jenny
James         Jane
Joe           Jacky
Jenny         Jane
Jane          Jacky
Jacky         Jim

At the risk of drifting off-topic, notice how adding the predicate where d2.name_new <> d.name_new changes the execution plan from this:

Plan hash value: 1813657616

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |      |      1 |        |      0 |00:00:00.01 |      27 |
|   1 |  UPDATE                      | DEMO |      1 |        |      0 |00:00:00.01 |      27 |
|   2 |   TABLE ACCESS STORAGE FULL  | DEMO |      1 |     82 |      6 |00:00:00.01 |       7 |
|*  3 |   VIEW                       |      |      1 |      1 |      1 |00:00:00.01 |       7 |
|*  4 |    WINDOW SORT PUSHED RANK   |      |      1 |     82 |      1 |00:00:00.01 |       7 |
|   5 |     TABLE ACCESS STORAGE FULL| DEMO |      1 |     82 |      6 |00:00:00.01 |       7 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=1)
   4 - filter(ROW_NUMBER() OVER ( ORDER BY "DBMS_RANDOM"."RANDOM"())<=1)

to this:

Plan hash value: 1813657616
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |      |      1 |        |      0 |00:00:00.01 |      62 |       |       |          |
|   1 |  UPDATE                      | DEMO |      1 |        |      0 |00:00:00.01 |      62 |       |       |          |
|   2 |   TABLE ACCESS STORAGE FULL  | DEMO |      1 |     82 |      6 |00:00:00.01 |       7 |  1025K|  1025K|          |
|*  3 |   VIEW                       |      |      6 |      1 |      6 |00:00:00.01 |      42 |       |       |          |
|*  4 |    WINDOW SORT PUSHED RANK   |      |      6 |      4 |      6 |00:00:00.01 |      42 |  2048 |  2048 | 2048  (0)|
|*  5 |     TABLE ACCESS STORAGE FULL| DEMO |      6 |      4 |     30 |00:00:00.01 |      42 |  1025K|  1025K|          |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=1)
   4 - filter(ROW_NUMBER() OVER ( ORDER BY "DBMS_RANDOM"."RANDOM"())<=1)
   5 - filter("D2"."NAME_NEW"<>:B1)

which is a neat example of the same execution plan (Plan hash value 1813657616) doing two rather different things, if ever you want one of those.

(If there is a hint that does the same thing, it would make a neat example of a hint changing the results.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.