Python data frame can't read CSV due to size

Question

I am trying to read a CSV file to a Dataframe but having issues as the CSv is too large (the process is just being killed).

I am only trying to do some simple updates to the Dataframe.

This is my current code:

df = pd.read_csv(input_file)
df = df[df.col_5 != 'col_5']
columns_req = ['COL_1','COL_2','COL_3','COL_4']
df = df.loc[:,columns_req]
df = df.rename(columns={col:col.lower() for col in df.columns})
df.to_csv(output_file, sep=',', index=False)

All of the code above works as expected when using a smaller CSV however breaks when using a larger CSV.

Is there any way I can process this?

I have read that I can iterate such as:

foo = pd.read_csv(input_file, iterator=True, chunksize=1000)

But I don't know if this will work as I expect. How do I apply my alterations to foo and then combine all the rows again at the end?

Serge de Gosson de Varennes · Accepted Answer · 2021-03-29 08:51:01Z

3

You could read, as you say in chunks. Here is an example:

import pandas as pd
import numpy as np
import time 
df = pd.DataFrame(data=np.random.randint(99999, 99999999, size=(10000000,10)),columns=['A','B','C','D','E','F','G','H','I','J'])
df['K'] = pd.util.testing.rands_array(5,10000000)
df.to_csv("my_file.csv")

If you read your file the usual way:

start = time.time()
df = pd.read_csv('my_file.csv')
end = time.time()
print("Reading time: ",(end-start),"sec")

Read time:   20.328343152999878 sec

while reading in chunks

start = time.time()
chunk = pd.read_csv('my_file.csv',chunksize=1000000)
end = time.time()
print("Reading time: ",(end-start),"sec")
pd_df = pd.concat(chunk)

Reading time:   0.011000394821166992 sec

answered Mar 29, 2021 at 8:51

Serge de Gosson de Varennes

11.6k4 gold badges30 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python data frame can't read CSV due to size

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related