2

Happy new year everyone!

I'm currently struggling with ETL performance issues as I'm trying to write larger Pandas DataFrames (1-2 mio rows, 150 columns) into an Oracle data base. Even for just 1000 rows, Panda's default to_sql() method runs well over 2 minutes (see code snippet below).

My strong hypothesis is that these performance issues are in some way related to the underlying data types (mostly strings). I ran the same job on 1000 rows of random strings (benchmark: 3 min) and 1000 rows of large random floats (benchmark: 15 seconds).

def_save(self, data: pd.DataFrame):
    engine = sqlalchemy.create_engine(self._load_args['con'])
    table_name = self._load_args["table_name"]

    if self._load_args.get("schema", None) is not None:
        table_name = self._load_args['schema'] + "." + table_name

    with engine.connect() as conn:
        data.to_sql(
            name=table_name,
            conn=conn,
            if_exists='replace',
            index=False,
            method=None# oracle dialect does not support multiline inserts
        )
    return

Anyone here how has experience in efficiently loading mixed data into an Oracle data base using python?

Any hints, code snippets and/or API recommendations are very much appreciated.

Cheers,

1
  • 4
    For very large DataFrames you will likely get the best performance if you dump the DataFrame to a CSV file (or similar) and then shell out to run SQL*Loader. Commented Jan 5, 2021 at 23:19

1 Answer 1

3

As said in your question, you are not able to use method='multi' with you db flavor. This is the key reason inserts are so slow, as data going in row by row.

Using SQL*Loader as suggested by @GordThompson may be fastest route for relatively wide/big table. Example on setting up SQL*Loader

Another option to consider is cx_Oracle. See Speed up to_sql() when writing Pandas DataFrame to Oracle database using SqlAlchemy and cx_Oracle

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.