7

I have a csv file on with millions of rows. I used to create a dictionary out the csv file like this

 with open('us_db.csv', 'rb') as f:
    data = csv.reader(f)
    for row in data:
       Create Dictionary based on a column

Now to filter the rows based on some conditions I use pandas Dataframe as it is super fast in these operations. I load the csv as pandas Dataframe do some filtering. Then I want to continue doing the above. I thought of using pandas df.iterrows() or df.itertuples() but it is really slow.

Is there a way to convert the pandas dataframe to csv.reader() directly so that I can continue to use the above code. If I use csv_rows = to_csv(), it gives a long string. Ofcourse, I can write out a csv and then read from it again. But I want to know if there is a way to skip the extra read and write to a file.

2 Answers 2

14

You could do something like this..

import numpy as np
import pandas as pd
from io import StringIO
import csv

#random dataframe
df = pd.DataFrame(np.random.randn(3,4))

buffer = StringIO()  #creating an empty buffer
df.to_csv(buffer)  #filling that buffer
buffer.seek(0) #set to the start of the stream

for row in csv.reader(buffer):
    #do stuff
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. That worked. As I was using python2.7 I had to use BytesIO instead of StringIO() as I had some problems with utf-8 coding.
0

Why don't you apply the Create Dictionary function to the target column? Something like:

df['column_name'] = df['column_name'].apply(Create Dictionary)

1 Comment

I need the whole row to be available inside the function. Apply only send one value at a time. Not one row at a time. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.