1

I'm looking for some help on how to bulk update an Oralce SQL table with records up to 250,000.

Basically I have a list of keys that is passed in to a function that then needs to update an Oracle table. The list can up to 250,000 rows, I can do this using a normal update statement or using an 'executemany' but both methods are too inefficient, so I need to do a bulk update, but I am not familiar with how to do this. I have searched for hours but I cannot figure this out!

todays_date = datetime.now().strftime("%d-%b-%Y")
status = str("DONE")

try:
  bind_values = {"status" : str(status),
                 "todays_date" : todays_date,
                 "keys_list" : list_of_keys}

  query = ("""FORALL i IN :keys_list.FIRST .. :keys_list.LAST
              UPDATE TABLE_NAME
              SET COLUMN1 = :status,
              UPDATE_DATE = :todays_date
              WHERE KEY = :i""")

  cursor.execute(query, bind_values)
  conn.commit()
  self.CloseConnection(conn)
except cx_Oracle.DatabaseError, e:
  error, = e.args
  print("  >> Database error: %s" % format(e))
  conn.rollback()
  return False

Any help would be appreciated.

UPDATE @abarnert - thank you very much for the suggestion, you are definitely on to something here, I managed to get this far

cursor.execute("""CREATE GLOBAL TEMPORARY TABLE TodaysKeys
                 (key STRING PRIMARY KEY)
                 on commit delete rows
                 AS (INSERT INTO TodaysKeys VALUES (:i))
                 UPDATE TABLE_NAME
                 SET COLUMN1 = :status,
                 UPDATE_DATE = :todays_date
                 WHERE KEY IN (SELECT * FROM TodaysKeys)
                 TABLE TodaysKeys""", i=keys_list, 
                 status=str(updatestatus), 
                 todays_date=todays_date)

But now all I get is an error: "ORA-01036: illegal variable name/number". I am sure it is something really obvious but I have checked and rechecked but can't for the life of me see where I am going wrong!

From all the research into this approach, it seems to be the right method...if I can just get it working to test! Please help.

4
  • What do you mean by "too inefficient"? Do you mean that the upload takes too much time? How will you know when a solution is efficient enough? Commented Jan 4, 2014 at 0:52
  • If a single UPDATE statement across 250K rows is too slow, the problem is almost certainly in your data model or your database configuration, and there's nothing you can do from Python that will speed that up. Commented Jan 4, 2014 at 0:57
  • If the table is 250K rows and you're trying to update every row at once, it might be faster to dump it to a file, change the file, and LOAD DATA (or maybe use the separate bulk loader)… but I doubt it. Commented Jan 4, 2014 at 0:59
  • "too inefficient" - yes it takes too much time, it times out. For example to update 75,000 rows, takes approximately 3 hours which is crazy! This is doing an update per each record! Not sure how else to do it. Commented Jan 4, 2014 at 1:20

2 Answers 2

1

You need to find some way to do 1 operation across 250K rows, instead of 250K separate 1-row operations, because obviously, given something about your data model design (which I'm guessing you neither control nor understand) the latter is just way too slow.

So, how do you do that?

One way is to create a dead-simple temporary table, dump all of today's keys into it (which should be much faster with executemany—or, if not, at least much simpler with LOAD DATA…), then do an UPDATE that refers to the keys from that temp table. Like this (pseudocode, based on testing something with sqlite3 and then converting to Oracle from distance memory…):

CREATE TEMPORARY TABLE TodaysKeys (key INT PRIMARY KEY)

INSERT INTO TodaysKeys VALUES (:i)

UPDATE TABLE_NAME
    SET COLUMN1 = :status,
    UPDATE_DATE = :todays_date
    WHERE KEY IN (SELECT * FROM TodaysKeys)

DROP TABLE TodaysKeys

If this is slow, that implies that you don't have an index on the KEY column, in which case… really there's no way to speed this up without fixing that.

Sign up to request clarification or add additional context in comments.

4 Comments

FWIW, in the world of datawarehousing, indexes tend to be seen as more of a liability than a benefit - because they cause excessive intertrack seeks, which are much slower than intratrack seeks.
This might help: infolab.stanford.edu/~ullman/fcdb/oracle/or-load.html . At least, with the (now dead) Datallegro product, bulk load sped up massive loads a lot.
@abarnert - thanks for the suggestion, hopefully this is the correct answer....once I can get it to run to test I will let you know.
@dstromberg: The trick here is that you can't bulk load updates. Using a temporary table to dump the keys into may make the bulk load unnecessary—but if it's still necessary, the temporary key table also happens to make the bulk load dead-simple. As for your other comment, this sounds more like a production table to me; usually with data warehousing you're inserting 250K rows/day but updating almost nothing… But since the OP hasn't told us, that's really just a guess.
1

Thanks to all for your suggestions, I finally used this approach, hopefully this might help someone else in the future in a similar scenario.

query = """
         DECLARE
           CURSOR rec_cur IS
           SELECT UNQ_KEY
           FROM TABLE_NAME
           WHERE COLUMN1 = 'NEW';
           TYPE updated_keys IS TABLE OF VARCHAR(100);
           pk_tab updated_keys;
         BEGIN
           OPEN rec_cur;
           LOOP
             FETCH rec_cur BULK COLLECT INTO pk_tab LIMIT 5000;
             EXIT WHEN pk_tab.COUNT() = 0;

             FORALL i IN pk_tab.FIRST .. pk_tab.LAST
               UPDATE TABLE_NAME
               SET    COLUMN1 = :status,
                      UPDATE_DATE = :todays_date
               WHERE  unq_key = pk_tab(i);
           END LOOP;
           CLOSE rec_cur;
         END
     """

Thanks again

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.