2

I'm using the cx_Oracle module to create an sqlalchemy engine and retrieve data from an Oracle database using Python and Pandas, but I have an issue with performance, and I'm getting database errors.

My code works OK (but slow) if I specify just a couple of columns, and after a long time I can get all the 3.5 million rows. Here's the code I'm using:

import cx_Oracle
import pandas as pd
import config
from sqlalchemy import create_engine

engine = create_engine('oracle+cx_oracle://config.user:[email protected]:1521/?service_name=config.service')

sql = "select ITEM_NO, COUNTRY_CODE, LANGUAGE_CODE from table_ben_l_t where (LANGUAGE_CODE = 'es' and COUNTRY_CODE = 'ES')"

df = pd.read_sql(sql, engine)

As soon as I add more columns, I can't run this anymore as I'm getting a database error after a long, long delay (I'm talking hours).

I know that using this method to retrieve the data and create Pandas is not the only one available, however it is definitely the most convenient... Is there a better/safer way to get this data from the Oracle DB? I was thinking about downloading the rows chunk by chunk, dumping them into a list of dict that could be passed to Pandas, but this doesn't seem very "elegant"... I'm sure there must be a better way to do this... :-)

Thanks in advance to anyone who can help me! :-)

JF

EDIT 1: In response to @OldProgrammer and @crocarneiro:

There are no indexes to the columns according to the result of the query select * from all_ind_columns where table_name = 'TABLE_BEN_L_T';

EDIT 2: This is the error message that I received:

DatabaseError: (cx_Oracle.DatabaseError) ORA-01555: snapshot too old: rollback segment number 509 with name "_SYSSMU509_3146905099$" too small
(Background on this error at: http://sqlalche.me/e/14/4xp6)

Thanks a lot too @Christopher Jones! This looks very interesting!

5
  • are there indexes on the columns in your WHERE clause? Commented May 31, 2021 at 18:08
  • 1
    When you say, you are getting database error, can you please elaborate a bit more on that? Also, it would be easier to provide suggestions if you mention the use case of your code - what do you intend to do with 3.5M rows? and do you really need these many for further processing and so on Commented May 31, 2021 at 18:36
  • Hello @OldProgrammer, I'm really not that familiar with databases in general, I can only write simple queries similar to the one I used. Could you please tell me how I can find the information about the indexes? Commented May 31, 2021 at 18:58
  • 1
    Hi @jfpelletier, barging in: if you are using a oracle database and want to know if there are any indexes in your table you can perform this query: select * from all_ind_columns where table_name = 'TABLE_BEN_L_T';. The query will return the columns indexed and the indexes names. If you are using SQL Developer, or Toad, or other IDE usually a SHIFT + F4 or CTRL + click in the table name shows you this information. Commented May 31, 2021 at 19:41
  • 1
    You need to tune the number of batches that the data is fetched in. Fundamentally this is handled by cx_Oracle underneath Pandas with fetcharraysize and the newer prefetchrows attributes. Doc for tuning is here Tuning Fetch Performance. Also see oralytics.com/2018/11/14/… Commented May 31, 2021 at 22:49

1 Answer 1

0

I finally solved the issue by tuning the fetcharraysize and the prefetchrows attributes as suggested by Christopher Jones.

I dramatically improved the performance by setting the values of the attributes, and no more time-outs and error messages!

Thanks very much all who helped! :-)

JF

Sign up to request clarification or add additional context in comments.

1 Comment

Hi jfpelletier. This fits more as a comment rather than an answer. If you want to contribute to SO, it would be great if you could expand on your answer and show the solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.