0

I am having a baffling issue with pandas 'chunksize' parameter. I wrote a program in python that loops through a set of values and creates queries based on them. This data needs to be written to .csv files and sent to a colleague. The result of these queries is large, so the .csv need to be written chunk by chunk.

See to the following code:

values = [col1, col2, col3]

for col in values:
   sql_query = "SELECT " + col + " other columns..."  + " from big_table WHERE some condition..."

for chunk in pd.read_sql(sql_query, conn, chunksize = 80000):
    chunk.to_csv(output_path + 'filename.csv', index=False, mode = 'a')

At first, I thought this program was working as the files were written with no issue. I decided to do a basic sanity check - comparing the number of rows in the raw query vs the number of lines in the file. They did not match.

I entered the sql_query, but using a count(*), directly into the database like so:

SELECT  count(*) from big_table WHERE some condition;

result: ~1,500,000 rows

Then, I counted the rows in the file: ~1,500,020 rows

This was the same for every file. It seems the values were off by 20 - 30 rows. I am not sure how this is possible, because the queries should be being passed to the DB exactly as I have written them. Am I misunderstanding how 'chunksize' works in pandas? Is there a possible some chunks are overlapping or incomplete?

1
  • Wrapped lines in the CSV file? Try writing the file back to an empty table and see what you get? Commented Jan 9, 2022 at 1:19

1 Answer 1

2

Each chunk gets its own header line. You would need to set header=False for all chunks but the first. Or for all chunks, whatever you wish.

Better yet, just use python directly and bypass pandas, then you won't need to do it in chunks in the first place, and it should be much faster.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.