I am trying to read a CSV file to a Dataframe but having issues as the CSv is too large (the process is just being killed).
I am only trying to do some simple updates to the Dataframe.
This is my current code:
df = pd.read_csv(input_file)
df = df[df.col_5 != 'col_5']
columns_req = ['COL_1','COL_2','COL_3','COL_4']
df = df.loc[:,columns_req]
df = df.rename(columns={col:col.lower() for col in df.columns})
df.to_csv(output_file, sep=',', index=False)
All of the code above works as expected when using a smaller CSV however breaks when using a larger CSV.
Is there any way I can process this?
I have read that I can iterate such as:
foo = pd.read_csv(input_file, iterator=True, chunksize=1000)
But I don't know if this will work as I expect. How do I apply my alterations to foo and then combine all the rows again at the end?