Reading csv file twice in python

Question

Here is my Python code:

import csv

# Reading
ordersFile = open('orders.csv', 'rb')
ordersR = csv.reader(ordersFile, delimiter=',')

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in ordersR:
    if order[2] == '5' and order[13] == 'Brazil':
        print order
# Find order employeeID=5
print "Find order employeeID=5"
for order in ordersR:
    if order[2] == '5':
        print order
ordersFile.close()

I can print something of "# Find order employeeID=5, shipCountry="Brazil"", but I got nothing for # Find order employeeID=5. I was thinking of how to reading(selecting) rows in the same csv files more than one time.

You cannot read the same open file twice. Either rewind it before reading for the second time, or close and open again. Better yet, count everything you need when you read the file for the first time. — DYZ
– DYZ, Commented Sep 6, 2017 at 20:12

wkl · Accepted Answer · 2017-09-06 20:15:08Z

10

You're just reading right through your CSV file, but if you want to work on the data in multiple passes, you should read the contents into a variable. Then you don't have to re-read the file every time you need to do stuff with it.

import csv

# Read order rows into our list
# Here I use a context manager so that the file is automatically
# closed upon exit
with open('orders.csv') as orders_file:
    reader = csv.reader(orders_file, delimiter=',')
    orders = list(reader)

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in orders:
    if order[2] == '5' and order[13] == 'Brazil':
        print order

# Find order employeeID=5
print "Find order employeeID=5"
for order in orders:
    if order[2] == '5':
        print order

If your CSV file is too huge to fit into memory (or you don't want to read it all into memory for some reason), then you'll need a different approach. If you need that, please leave a comment.

edited Sep 6, 2017 at 20:15

answered Sep 6, 2017 at 20:13

wkl

80.4k16 gold badges171 silver badges178 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

DYZ Over a year ago

Isn't for row in reader:orders.extend(row) just orders=list(reader)?

user6142261 Over a year ago

@birryree Thanks, and could you share a different approach as well?

wkl Over a year ago

@user6142261 - would you say all your stuff can be done in one pass (if so, other answers have that covered)? Do you not want to read everything into memory? Are you open to using third-party libs? Do you have huge CSV files?

Nick T · Accepted Answer · 2017-09-06 20:13:39Z

4

It's better to read through files once because I/O is likely to be the slowest part of your program.

If you need to re-read the file, you can either close it and re-open it, or seek() to the beginning, i.e. add ordersFile.seek(0) between your loops.

answered Sep 6, 2017 at 20:13

Nick T

27k14 gold badges88 silver badges128 bronze badges

Comments

PRMoureu · Accepted Answer · 2017-09-06 20:14:11Z

3

What you can do is simply convert the reader object result into a list :

with open('orders.csv', 'rb') as ordersFile:
    ordersR = list(csv.reader(ordersFile, delimiter=','))

The reader object is like a generator, once you have iterate the values, you cannot begin a second loop to read the values again.

answered Sep 6, 2017 at 20:14

PRMoureu

13.4k6 gold badges46 silver badges52 bronze badges

Comments

hiro protagonist · Accepted Answer · 2017-09-06 20:32:31Z

1

if you do not want to store all your data in a list, this is a pure generator-based approach to iterate over your csv file twice. using itertools.tee:

with open('orders.csv', 'r') as file:
    rows0, rows1 = tee(reader(file, delimiter=','))

    for row in rows0:
        print(row)  # search for something...

    print()

    for row in rows1:
        print(row)  # search for a different thing...

answered Sep 6, 2017 at 20:32

hiro protagonist

47.4k17 gold badges93 silver badges119 bronze badges

Comments

Mohamed Ali JAMAOUI · Accepted Answer · 2017-09-06 20:14:15Z

0

This a good case for using the pandas module (you need to install it: pip install pandas)

After that, you just read the file once, and perform any type of fitering easily

for instance, to read and filter the file more that once, follow this example:

import pandas as pd 

# read csv into a dataframe 
df = pd.read_csv('orders.csv', delimiter=',') 

# get the data that has employeeID == 5
df1 = df[df["employeeID"] == 5]
print(df1) 

# get the data that has employeeID == 5 and  shipCountry=\"Brazil\"

df2 = df[(df["employeeID"] == 5)& (df["shipCountry"] == "Brazil")]
print(df2)

answered Sep 6, 2017 at 20:14

Mohamed Ali JAMAOUI

14.8k14 gold badges79 silver badges124 bronze badges

Comments

Chen A. · Accepted Answer · 2017-09-06 20:41:06Z

As @Nick T mentioned above, I/O is considered expensive comparing to RAM access, so if you need to iterate over your file more than once, it is better to save it to a variable.

You also can combine multiple conditions in a single for loop, so it performs faster (single iteration):

with open('orders.csv', 'rb') as ordersFile:
    orders = list(csv.reader(ordersFile, delimiter=','))

# Find order employeeID=5, shipCountry="Brazil"
emp = []
country = []
for order in orders:
    if order[2] == '5':
        if order[13] == 'Brazil':
            country.append(order)
        else:
            emp.append(order)

 print 'emp id=5 and shippingcountry=Brazil: {}'.format(country)
 print 'emp id=5: {}'.format(emp)

Note that this isn't scalable, you probably don't want to add any more if logic in this block as it becomes not readable

Collectives™ on Stack Overflow

Reading csv file twice in python

6 Answers 6

3 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related